Initial findings & opportunities to collaborate
AAEA 2025 — Denver, Colorado
July 27, 2025
Over the past several months, I’ve been working on a project that looks at
how USDA datasets are mentioned in research publications.
In the following slides, I’d like to share some of these initial findings and ask:
What would be useful for AAEA section members?
Possible questions to reflect on as you’re listening
USDA ERS and NASS were interested in tracking how their datasets are referenced in research papers.
Two dashboards that track mentions* of 11 key datasets in articles were built.
This helps:
These 11 datasets were selected based on relevance and input from USDA staff and researchers.
*A dataset mention refers to an instance in which a specific dataset is referenced, cited, or named within a research publication. This can occur in various parts of the text, such as the abstract, methods, data section, footnotes, or references, and typically indicates that the dataset was used, analyzed, or discussed in the study.
Details about this exercise
Datasets searched:
Searched articles published between 2015-2025.
Mentions of datasets can be found in publications by applying ML models and other methods.
Data source: Dimensions via Google BigQuery
References:
Read more about this project here.
We searched a large publication corpus (Dimensions) to find mentions of 11 USDA datasets and found that 17,311 authors mentioned these datasets across 8,290 publications in 1,664 journals.
Work was funded by the US Department of Agricultural (Economic Research Service and National Agricultural Statistics Service), the National Center for Science and Engineering Statistics, and the National Center for Education Statistics.
Food insecurity is one of the most frequently occurring topics among publications mentioning selected USDA datasets across all journals.
The word cloud shows the frequency of associated topics — larger words indicate more publications on that topic.
Access the Democratizing Data FAR Data Dashboard here.
To better understand how USDA datasets are used, we examined one area in more detail: food security–related research.*
This example will show:
The extent to which selected USDA datasets are mentioned within a topic area over time
The share of food security-related research that explicitly mentions these datasets
How USDA dataset usage has changed over time relative to all published work in this area
*Food security-related research includes articles with topics such as: “food security,” “SNAP,” “WIC,” “food access,” “food availability,” “food insecurity prevalence,” and “Nutrition Assistance Program.”
Details about this figure
This figure plots an index of unique publications, authors, or institutions mentioning the selected USDA datasets from 2015-2024, where 2015 is the base year (index = 1). For example, there were 595 publications in 2024 and 134 in 2015, so the index is 595 / 134 = 4.44.
Terms used to define food security research include: “food security”, “food insecurity”, “food security status”, “Supplemental Nutrition Assistance Program (SNAP)”, “Special Supplemental Nutrition Program for Women, Infants, and Children (WIC)”, “food access”, “food availability”, “prevalence of food insecurity”, “food pantries”, and “Nutrition Assistance Program”.
Data source: Dimensions via Google BigQuery
This implies that 79% of food security-related research uses other datasets (or doesn’t name the data at all).
That might not raise concern if mentions of USDA datasets were declining in line with overall trends in food security research.
But as we’ll see next, the volume of food security publications continues to grow while USDA dataset mentions have plateaued.
Details about this figure
This figure compares:
Terms used to define food security research include: “food security”, “food insecurity”, “food security status”, “Supplemental Nutrition Assistance Program (SNAP)”, “Special Supplemental Nutrition Program for Women, Infants, and Children (WIC)”, “food access”, “food availability”, “prevalence of food insecurity”, “food pantries”, and “Nutrition Assistance Program”.
Data source: Dimensions via Google BigQuery
Next question: What other datasets are researchers using for food security-related research?
\(\rightarrow\) Answering this requires access to full-text data and LLMs trained to identify dataset usage.
Currently working on this.
Details about this figure
This figure shows two trends in food security–related research from 2015 to 2024:
Terms used to define food security research include: “food security”, “food insecurity”, “food security status”, “Supplemental Nutrition Assistance Program (SNAP)”, “Special Supplemental Nutrition Program for Women, Infants, and Children (WIC)”, “food access”, “food availability”, “prevalence of food insecurity”, “food pantries”, and “Nutrition Assistance Program”.
Data source: Dimensions via Google BigQuery
We can only track what we can see. If authors are not writing about the datasets they use we won’t see it. Most authors don’t cite their data.
As an author, I find myself wanting better tools to find comparable studies using the same dataset – being able to search within publications that share a dataset is useful.
Publication metadata contains rich, underused information (e.g., open access, funders, citations, article processing charges) – available from OpenAlexAPI or by partnering with Digital Science (Dimensions).
At this stage, there’s a lot more we could do — but we would need resources and interested collaborators.
Questions to consider
For you:
Use dashboard to find collaborators and review broad data usage statistics.
With access to the publication metadata, additional trends on topics of interest can be explored further.
For our research community:
Example: Specialty Crops (S-1088)
💬 Questions?
📩 Lauren.Chenarides@colostate.edu
🌐 democratizingdata.ai
Acknowledgements:
Back to All Talks.