Methodology for Comparing Citation Database Coverage of Dataset Usage
Step 04: Compare Results Across Citation Databases
Compare Results across Citation Databases
The goal of this step is to develop statistics that measure dataset tracking accuracy.
Case Study: Census of Agriculture
Continuing with our case study, we use the datasets produced in Step 4 to produce counts of the number of journals with Ag Census publications that:
- only appear in Scopus,
- only appear in OpenAlex, or
- appear in both.
For journals that contain Ag Census data in both citation databases, we summarize the coverage of publications that appear in both Scopus and OpenAlex.
Then, we investigate discrepancies based on factors like missing identifiers, mismatched journal information (ISSNs), and additional publications accessed through OpenAlex’s API.
Add here: What are the steps in producing Table AA
Journal Coverage
Institution Disambiguation
Journal Coverage
Institution Disambiguation
Results from Database Comparison
This section presents results after matching (which type varies – deterministic vs fuzzy)
Summary of Matching Methods
- Rule-based matching for exact matches
- Probabilistic matching for handling variations
- Machine learning methods for complex cases
Method | Considerations | Example | Pros | Cons |
---|---|---|---|---|
Searching for dataset names within Scopus | ||||
Searching for dataset names within OpenAlex | “Location” field set to “journal” | |||
Disambiguation of authors | ||||
Disambiguation of institutions | ||||
Standardization of institutions | ||||
Searching based on the frequency of dataset appearance in journals | ||||
MORE . . . | ||||
Filtering on keywords to determine themes |
All appendices referenced throughout the report are located on this page.