Initial findings & opportunities to collaborate
AAEA 2025 — Denver, Colorado
July 27, 2025
Over the past several months, I’ve been working on a project that looks at
how USDA datasets are mentioned in research publications.
In the following slides, I’d like to share some of these initial findings and ask:
What would be useful for editors and reviewers?
Possible questions to reflect on as you’re listening
USDA ERS and NASS were interested in tracking how their datasets are referenced in research papers.
Two dashboards that track mentions* of 11 key datasets in articles were built.
This helps:
These 11 datasets were selected based on relevance and input from USDA staff and researchers.
*A dataset mention refers to an instance in which a specific dataset is referenced, cited, or named within a research publication. This can occur in various parts of the text, such as the abstract, methods, data section, footnotes, or references, and typically indicates that the dataset was used, analyzed, or discussed in the study.
Details about this exercise
Datasets searched:
Searched articles published between 2015-2025.
Mentions of datasets can be found in publications by applying ML models and other methods.
Data source: Dimensions via Google BigQuery
References:
Read more about this project here.
We searched a large publication corpus (Dimensions) to find mentions of 11 USDA datasets and found that 17,311 authors mentioned these datasets across 8,290 publications in 1,664 journals.
Work was funded by the US Department of Agricultural (Economic Research Service and National Agricultural Statistics Service), the National Center for Science and Engineering Statistics, and the National Center for Education Statistics.
Food insecurity is one of the most frequently occurring topics among publications mentioning selected USDA datasets across all journals.
The word cloud shows the frequency of associated topics — larger words indicate more publications on that topic.
Access the Democratizing Data FAR Data Dashboard here.
Within this sample, food security–related articles make up 11% of AJAE publications.
\(\rightarrow\) The dashboard supports topic filtering to identify active authors contributing to this (and other) topic area, view their broader publishing activity, thereby narrowing the scope of possible referees.
This example is one possible application of this type of metadata integrated into an interactive user interface.
Access the Democratizing Data FAR Data Dashboard here.
Stepping back from the dashboard itself, this work has informed me of broader insights about how researchers use data, what’s visible to us as readers and editors, and what tools might help fill in the gaps.
We can only track what we can see. If authors are not writing about the datasets they use we won’t see it. Most authors don’t cite their data.
As an author, I find myself wanting better tools to find comparable studies using the same dataset – being able to search within publications that share a dataset is useful.
Publication metadata contains rich, underused information (e.g., open access, funders, citations, article processing charges) – available from OpenAlexAPI or by partnering with Digital Science (Dimensions).
At this stage, there’s a lot more we could do — but we would need resources and interested collaborators.
Questions to consider
Could better visibility into dataset usage help with reviewer selection?
Are there opportunities to encourage better practices for dataset mentions/citations in our editorial and review processes?
How might this work inform benchmarking AJAE against other field journals?
Last year, we discussed the idea of online special issues. One option could be curating a set of articles that all use the same core dataset.
\(\rightarrow\) Use the dashboard for this.
Beyond food security, what other topics may be of interest to understand data usage?
\(\rightarrow\) Use metadata for this.
Wiley already encourages data citation (see Wiley’s Data Citation Policy)
\(\rightarrow\) Is it enforced? What are the metrics on compliance since implementation? Could stronger practices improve transparency?
💬 Questions?
📩 Lauren.Chenarides@colostate.edu
🌐 democratizingdata.ai
Acknowledgements:
Back to All Talks.