Dataset Search & Discovery

Initial findings & opportunities to collaborate
AAEA 2025 — Denver, Colorado

Lauren Chenarides

July 27, 2025

Setting

Over the past several months, I’ve been working on a project that looks at
how USDA datasets are mentioned in research publications.

In the following slides, I’d like to share some of these initial findings and ask:

What would be useful for AAEA section members?

Possible questions to reflect on as you’re listening

  • What can metadata on topic trends tell us about research in your section’s area of focus?
  • Could this type of platform help identify potential collaborators or build working groups?
  • Are there emerging research topics that could inform next year’s track session proposals?
  • What can metadata on funding tell us about who is supporting research in your section’s area of focus?

Why USDA data?

USDA ERS and NASS were interested in tracking how their datasets are referenced in research papers.

Two dashboards that track mentions* of 11 key datasets in articles were built.

This helps:

  • Summarize dataset usage over time
  • Identify where and in what topics USDA data shows up
  • Highlight research across areas like food security, crop production, and others…

These 11 datasets were selected based on relevance and input from USDA staff and researchers.


*A dataset mention refers to an instance in which a specific dataset is referenced, cited, or named within a research publication. This can occur in various parts of the text, such as the abstract, methods, data section, footnotes, or references, and typically indicates that the dataset was used, analyzed, or discussed in the study.

Details about this exercise

Datasets searched:

  1. Agricultural Resource Management Survey (ARMS)
  2. Current Population Survey Food Security Supplement
  3. Farm to School Census
  4. Information Resources, Inc. (IRI) InfoScan (through USDA TPAA)
  5. NASS Census of Agriculture
  6. Tenure, ownership, and transition of agricultural land (TOTAL) survey
  7. Food Access Research Atlas
  8. Food Acquisition and Purchase Survey
  9. Household Food Security Survey Module
  10. Local Food Marketing Practices Survey
  11. Quarterly Food at Home Price Database

Searched articles published between 2015-2025.

Mentions of datasets can be found in publications by applying ML models and other methods.

Data source: Dimensions via Google BigQuery

References:

Who uses USDA data…

We searched a large publication corpus (Dimensions) to find mentions of 11 USDA datasets and found that 17,311 authors mentioned these datasets across 8,290 publications in 1,664 journals.

…and how?

Food insecurity is one of the most frequently occurring topics among publications mentioning selected USDA datasets across all journals.

The word cloud shows the frequency of associated topics — larger words indicate more publications on that topic.

Use case: What can we observe from a focused topic area?

To better understand how USDA datasets are used, we examined one area in more detail: food security–related research.*

This example will show:

  1. The extent to which selected USDA datasets are mentioned within a topic area over time

  2. The share of food security-related research that explicitly mentions these datasets

  3. How USDA dataset usage has changed over time relative to all published work in this area


*Food security-related research includes articles with topics such as: “food security,” “SNAP,” “WIC,” “food access,” “food availability,” “food insecurity prevalence,” and “Nutrition Assistance Program.”

Mentions of selected USDA datasets have grown over time in food security–related research.

  • Mentions in publications have grown, but pace has leveled off slightly in recent years
  • Authors mentioning datasets have grown more than 6x, suggesting strong interest across the research community
  • Institutions mentioning datasets has grown slowly, perhaps due to potential concentration within organizations
Click to enlarge

Details about this figure

This figure plots an index of unique publications, authors, or institutions mentioning the selected USDA datasets from 2015-2024, where 2015 is the base year (index = 1). For example, there were 595 publications in 2024 and 134 in 2015, so the index is 595 / 134 = 4.44.

Terms used to define food security research include: “food security”, “food insecurity”, “food security status”, “Supplemental Nutrition Assistance Program (SNAP)”, “Special Supplemental Nutrition Program for Women, Infants, and Children (WIC)”, “food access”, “food availability”, “prevalence of food insecurity”, “food pantries”, and “Nutrition Assistance Program”.

Data source: Dimensions via Google BigQuery

What I’ve learned

  • We can only track what we can see. If authors are not writing about the datasets they use we won’t see it. Most authors don’t cite their data.

  • As an author, I find myself wanting better tools to find comparable studies using the same dataset – being able to search within publications that share a dataset is useful.

  • Publication metadata contains rich, underused information (e.g., open access, funders, citations, article processing charges) – available from OpenAlexAPI or by partnering with Digital Science (Dimensions).

At this stage, there’s a lot more we could do — but we would need resources and interested collaborators.

Reflecting on value to AAEA Sections

Questions to consider

  • What can metadata on topic trends tell us about research in your section’s area of focus?
  • Could this type of platform help identify potential collaborators or build working groups?
  • Are there emerging research topics that could inform next year’s track session proposals?
  • What can metadata on funding tell us about who is supporting research in your section’s area of focus?

For you:

  • Use dashboard to find collaborators and review broad data usage statistics.

  • With access to the publication metadata, additional trends on topics of interest can be explored further.

For our research community:

  • If we can identify the “other” datasets (via LLMs, for example), we can also identify the researchers using them and potentially build networks of topic experts, reviewers, and collaborators around them.

Final thoughts

  • How might AAEA Sections use this dashboard?
    • Track session proposals
    • Pre/post conference workshop ideas
    • Collaborators for working groups outside of meetings
  • Aside from food security, what other topics may be of interest to understand data visibility?
    • Specialty crops
    • Food safety
    • Food marketing


Example: Specialty Crops (S-1088)

  • Use dashboard metadata to explore funding trends on specialty crop research
  • Hatch renewal comment: “Expand members” → dashboard can help!
  • Meeting idea: live demo + brainstorming researchable questions

Thank You

Acknowledgements:

  • Julia Lane
  • Rafael Ladislau
  • Nick Pallotta
  • Spiro Stefanou