Methodology for Comparing Citation Database Coverage of Dataset Usage

Step 04: Compare Results Across Citation Databases

Published

March 16, 2025

Compare Results across Citation Databases

The goal of this step is to develop statistics that measure dataset tracking accuracy.

Case Study: Census of Agriculture

Continuing with our case study, we use the datasets produced in Step 4 to produce counts of the number of journals with Ag Census publications that:

  1. only appear in Scopus,
  2. only appear in OpenAlex, or
  3. appear in both.

For journals that contain Ag Census data in both citation databases, we summarize the coverage of publications that appear in both Scopus and OpenAlex.

Then, we investigate discrepancies based on factors like missing identifiers, mismatched journal information (ISSNs), and additional publications accessed through OpenAlex’s API.

Add here: What are the steps in producing Table AA

Journal Coverage

Author Disambiguation

Institution Disambiguation

Journal Coverage

Author Disambiguation

Institution Disambiguation

Results from Database Comparison

This section presents results after matching (which type varies – deterministic vs fuzzy)

  1. Rule-based matching for exact matches
  2. Probabilistic matching for handling variations
  3. Machine learning methods for complex cases
Table 1: Summary of Methods
Method Considerations Example Pros Cons
Searching for dataset names within Scopus
Searching for dataset names within OpenAlex “Location” field set to “journal”
Disambiguation of authors
Disambiguation of institutions
Standardization of institutions
Searching based on the frequency of dataset appearance in journals
MORE . . .
Filtering on keywords to determine themes

All appendices referenced throughout the report are located on this page.