Dataset Search & Discovery

Initial findings & opportunities to collaborate
AAEA 2025 — Denver, Colorado

Lauren Chenarides

July 27, 2025

Setting

Over the past several months, I’ve been working on a project that looks at
how USDA datasets are mentioned in research publications.

In the following slides, I’d like to share some of these initial findings and ask:

What would be useful for editors and reviewers?

Possible questions to reflect on as you’re listening

  • Could better visibility into dataset usage help with reviewer selection?
  • Are there opportunities to encourage better practices for dataset mentions/citations in our editorial and review processes?
  • How might this work inform benchmarking AJAE against other field journals?

Why USDA data?

USDA ERS and NASS were interested in tracking how their datasets are referenced in research papers.

Two dashboards that track mentions* of 11 key datasets in articles were built.

This helps:

  • Summarize dataset usage over time
  • Identify where and in what topics USDA data shows up
  • Highlight research across areas like food security, crop production, and others…

These 11 datasets were selected based on relevance and input from USDA staff and researchers.


*A dataset mention refers to an instance in which a specific dataset is referenced, cited, or named within a research publication. This can occur in various parts of the text, such as the abstract, methods, data section, footnotes, or references, and typically indicates that the dataset was used, analyzed, or discussed in the study.

Details about this exercise

Datasets searched:

  1. Agricultural Resource Management Survey (ARMS)
  2. Current Population Survey Food Security Supplement
  3. Farm to School Census
  4. Information Resources, Inc. (IRI) InfoScan (through USDA TPAA)
  5. NASS Census of Agriculture
  6. Tenure, ownership, and transition of agricultural land (TOTAL) survey
  7. Food Access Research Atlas
  8. Food Acquisition and Purchase Survey
  9. Household Food Security Survey Module
  10. Local Food Marketing Practices Survey
  11. Quarterly Food at Home Price Database

Searched articles published between 2015-2025.

Mentions of datasets can be found in publications by applying ML models and other methods.

Data source: Dimensions via Google BigQuery

References:

Who uses USDA data…

We searched a large publication corpus (Dimensions) to find mentions of 11 USDA datasets and found that 17,311 authors mentioned these datasets across 8,290 publications in 1,664 journals.

…and how?

Food insecurity is one of the most frequently occurring topics among publications mentioning selected USDA datasets across all journals.

The word cloud shows the frequency of associated topics — larger words indicate more publications on that topic.

Use case: A new food security submission arrives. Who should review it?

Within this sample, food security–related articles make up 11% of AJAE publications.
\(\rightarrow\) The dashboard supports topic filtering to identify active authors contributing to this (and other) topic area, view their broader publishing activity, thereby narrowing the scope of possible referees.

Filter by topic, publication title, or dataset directly from the dashboard.

Click to enlarge

You can also apply custom filters, such as searching by author name or other key metadata fields.

Click to enlarge

This example is one possible application of this type of metadata integrated into an interactive user interface.

What I’ve learned

Stepping back from the dashboard itself, this work has informed me of broader insights about how researchers use data, what’s visible to us as readers and editors, and what tools might help fill in the gaps.

  • We can only track what we can see. If authors are not writing about the datasets they use we won’t see it. Most authors don’t cite their data.

  • As an author, I find myself wanting better tools to find comparable studies using the same dataset – being able to search within publications that share a dataset is useful.

  • Publication metadata contains rich, underused information (e.g., open access, funders, citations, article processing charges) – available from OpenAlexAPI or by partnering with Digital Science (Dimensions).

At this stage, there’s a lot more we could do — but we would need resources and interested collaborators.

Reflecting on value to AJAE

Questions to consider

  • Could better visibility into dataset usage help with reviewer selection?

  • Are there opportunities to encourage better practices for dataset mentions/citations in our editorial and review processes?

  • How might this work inform benchmarking AJAE against other field journals?

  • Last year, we discussed the idea of online special issues. One option could be curating a set of articles that all use the same core dataset.
    \(\rightarrow\) Use the dashboard for this.

  • Beyond food security, what other topics may be of interest to understand data usage?
    \(\rightarrow\) Use metadata for this.

  • Wiley already encourages data citation (see Wiley’s Data Citation Policy)
    \(\rightarrow\) Is it enforced? What are the metrics on compliance since implementation? Could stronger practices improve transparency?

Thank You

Acknowledgements:

  • Julia Lane
  • Rafael Ladislau
  • Nick Pallotta
  • Spiro Stefanou