Dataset Search & Discovery

Initial findings & opportunities to collaborate
S-1088 Annual Meeting

Lauren Chenarides

Assistant Professor

Colorado State University

October 12, 2025

Setting

What do you value most from participating in a Multi-State Project (e.g., S-1088)?

  • Networking
  • Data sharing
  • Collaborating on papers
  • Research collaboration

What would members find most useful going forward?

  • Building opportunities for joint collaboration, especially around data collection and use

Takeaway:

Members see strong value in connections but want better structures for collaborative data use.

Tracking usage of USDA datasets

Over the past several months, I’ve been working on a project that looks at
how USDA datasets are mentioned in research publications.

In the following slides, I’d like to share some of these initial findings and ask:

What would be useful for S-1088 members?

Questions to consider

  • Could this kind of platform help identify potential collaborators or shared research interests?
  • What could metadata patterns reveal about activity in your research area?
  • What does the funding metadata suggest about who is investing in your field?

Why USDA data?

USDA ERS and NASS were interested in tracking how their datasets are referenced in research papers.

Two dashboards that track mentions* of 11 key datasets in articles were built.

This helps:

  • Summarize dataset usage over time
  • Identify where and in what topics USDA data shows up
  • Highlight research across areas like food security, crop production, and others.

These 11 datasets were selected based on relevance and input from USDA staff and researchers.


*A dataset mention refers to an instance in which a specific dataset is referenced, cited, or named within a research publication. This can occur in various parts of the text, such as the abstract, methods, data section, footnotes, or references, and typically indicates that the dataset was used, analyzed, or discussed in the study.

Details about this exercise

Datasets searched:

  1. Agricultural Resource Management Survey (ARMS)
  2. Current Population Survey Food Security Supplement
  3. Household Food Security Survey Module
  4. Farm to School Census
  5. Information Resources, Inc. (IRI) InfoScan (through USDA TPAA)
  6. NASS Census of Agriculture
  7. Tenure, ownership, and transition of agricultural land (TOTAL) survey
  8. Food Access Research Atlas
  9. Food Acquisition and Purchase Survey
  10. Local Food Marketing Practices Survey
  11. Quarterly Food at Home Price Database

Searched articles published between 2015-2025.

Mentions of datasets can be found in publications by applying ML models and other methods.

Data source: Dimensions via Google BigQuery

Scan QR code to access the Dashboard and follow along

https://democratizingdata.ai/tools/dashboard/dimensions/food-agricultural-research/

Who uses USDA data?

We searched a large publication corpus (Dimensions) to find mentions of 11 USDA datasets and found that 17,311 authors mentioned these datasets across 8,290 publications in 1,664 journals.

How are users using USDA data?

Food insecurity is one of the most frequently occurring topics among publications mentioning selected USDA datasets across all journals.

The word cloud shows the frequency of associated topics — larger words indicate more publications on that topic.

Use case

To better understand how USDA datasets are used, we examined one area in more detail: food security–related research.*

This exercise reveals interesting patterns about:

  1. How often selected USDA datasets are mentioned in publications over time

  2. The share of food security–related research that explicitly references these datasets

  3. How USDA dataset usage has evolved relative to all research output in this area


*Food security-related research includes articles with topics such as: “food security,” “SNAP,” “WIC,” “food access,” “food availability,” “food insecurity prevalence,” and “Nutrition Assistance Program.”

Mentions of selected USDA datasets have grown over time in food security–related research.

  • Mentions in publications have grown more than 4x (since 2015)
  • Authors mentioning datasets have grown more than 6x, suggesting strong interest across the research community
  • Number of institutions mentioning datasets has grown slowly, perhaps due to potential concentration within organizations
Click to enlarge

Details about this figure

This figure plots an index of unique publications, authors, or institutions mentioning the selected USDA datasets from 2015-2024, where 2015 is the base year (index = 1). For example, there were 595 publications in 2024 and 134 in 2015, so the index is 595 / 134 = 4.44.

Terms used to define food security research include: “food security”, “food insecurity”, “food security status”, “Supplemental Nutrition Assistance Program (SNAP)”, “Special Supplemental Nutrition Program for Women, Infants, and Children (WIC)”, “food access”, “food availability”, “prevalence of food insecurity”, “food pantries”, and “Nutrition Assistance Program”.

Data source: Dimensions via Google BigQuery

Then this…

July 2025 - Major climate change reports are removed from U.S. websites

September 2025 - USDA ends the Agricultural (Farm) Labor Survey, the U.S.’s only survey of agricultural employers

September 2025 - USDA Terminates Redundant Food Insecurity Survey

Lessons from tracking data usage

  1. We can only track what we can see. If authors are not writing about the datasets they use we won’t see it. Most authors don’t cite their data.

  2. As an author, I find myself wanting better tools to find comparable studies using the same dataset – being able to search within publications that share a dataset is useful.

  3. Publication metadata contains rich, underused information (e.g., open access, funders, citations, article processing charges) – available from OpenAlexAPI or by partnering with Digital Science (Dimensions).

At this stage, there’s a lot more we could do — but we would need resources and interested collaborators.

Another pain point

Funding outlook: State and federal budgets are shifting, with greater uncertainty around traditional sources.

Challenge: When our usual funders are constrained, or there is greater competition, how do we spot credible alternative funders aligned with our research?

What does the metadata tell us about S-1088 research?

Goal: Visualize where S-1088 research sits within the broader funding ecosystem.

  1. Start with the S-1088 member list: Identify current participants.

  2. Collect DOIs from member ORCIDs: Build a publication dataset linked to individual researchers.

  3. Search for topic codes across all published works: Map publications to relevant USDA or research themes.

  4. Retrieve all works within those topic codes: Capture the broader research landscape connected to S-1088 activities.

  5. Analyze funding outlets: Identify which agencies and programs are supporting work in these areas.

Results (1/3)

Analysis was done using the openalexR package in R.

Membership and coverage

  • 48 registered members of S-1088. 41 with ORCIDs.
  • After cleaning, 22 members matched to OpenAlex author IDs.

Publications

  • 2,054 “works” identified.
  • After de-duplication and restricting to only works of type article, resulted in 1,146 publications published between 1977 and 2025.
Click to enlarge

Results (2/3)

Common research topics:1

  1. Organic Food and Agriculture
  2. Economic and Environmental Valuation
  3. Economics of Agriculture and Food Markets
  4. Wine Industry and Tourism
  5. Consumer Market Behavior and Pricing

Common research concepts:2

  1. Willingness to pay
  2. Business
  3. Agriculture
  4. Biology
  5. Purchasing

Sample size: 1,138 articles

Results (3/3)

Common funders:1

  1. National Institute of Food and Agriculture
  2. U.S. Department of Agriculture
  3. National Natural Science Foundation of China
  4. Economic Research Service
  5. National Institutes of Health

Others: Robert Wood Johnson Foundation, Organic Farming Research Foundation, university and department seed funding, international funders



Sample size: 287 of 1,146 publications acknowledged grant funding.

Next steps using these metadata

Benchmarking: How do the themes connected to S-1088 activities compare with the broader research landscape?

Funding Strategy: Which agencies and programs are supporting work in these areas?

Reflecting on value to S-1088

Questions to consider

  • Could this kind of platform help identify potential collaborators or shared research interests?
  • What could metadata patterns reveal about activity in your research area?
  • What does the funding metadata suggest about who is investing in your field?

For you:

  • Use dashboard to find collaborators and review broad data usage statistics.

  • With access to the publication metadata, additional trends on topics of interest can be explored further.

  • What signals (topics, co-authorships, acknowledgments) could we use to surface new funders?

For our research community:

  • If we can identify the “other” datasets (via LLMs, for example), we can also identify the researchers using them and potentially build networks of topic experts, reviewers, and collaborators around them.

Final thoughts

  • How might you use this dashboard and the underlying metadata?
    • Locate collaborators for working groups outside of meetings
    • Brainstorm pre/post conference workshop ideas
    • Recruit new members for hatch projects
  • Where have you successfully diversified funding in the past year?
    • Look at related work to identify alternative funding sources
  • Aside from food security, what other topics may be of interest to understand data visibility?
    • Specialty crops
    • Food safety
    • Food marketing

Thank You

💬 Questions?

📩
🌐 democratizingdata.ai
GitHub Repo

Scan QR code access these slides

Acknowledgements:

  • Julia Lane
  • Rafael Ladislau
  • Nick Pallotta
  • Spiro Stefanou
  • Connor Whalen (undergrad at University of Nebraska-Lincoln)