Classifying Institutions with the IPEDS Database
Overview
The Integrated Postsecondary Education Data System (IPEDS) is a national dataset maintained by the National Center for Education Statistics (NCES). It collects data from all U.S. institutions participating in federal financial aid programs. In this project, IPEDS data is used to analyze institutional characteristics (e.g., size, classification, location, and financial indicators) in relation to research publications identified in Scopus, OpenAlex, and Dimensions.
The raw IPEDS data was obtained from the IPEDS website: IPEDS Data Center.
How These Files Fit into the Project Workflow
The IPEDS dataset provides institutional characteristics that are used in:
- Disambiguating institution names in citation databases (Scopus, OpenAlex, Dimensions)
- Comparing research output across different types of institutions
- Assessing dataset usage by institution type (e.g., HBCUs, land-grant universities, private vs. public institutions)
By integrating IPEDS institutional data with the publication datasets generated from each citation database, we can evaluate how research output varies across institution types and funding levels.
File Organization in GitHub Repository
Category | File Path & Link |
---|---|
Processed IPEDS Dataset | compare_scopus_openalex/resources/IPEDS/IPEDS.csv |
Raw IPEDS Data | compare_scopus_openalex/resources/raw_data_IPEDS/ |
Data Processing Code | compare_scopus_openalex/resources/documentation/IPEDSdata.rmd |
Data Documentation | compare_scopus_openalex/resources/documentation/IPEDS_Data.md |
Data Sources and Variables
The IPEDS dataset integrates multiple tables across different survey years (2017-2023) to capture important institutional characteristics.
Variable | IPEDS Table (Years) | Description |
---|---|---|
Institution Size | EFFY2017 - EFFY2023 |
Total student enrollment by level (undergrad/graduate) |
Institution Location | HD2017 - HD2023 |
Institution name, address, city, state, ZIP |
HBCU Status | HD2017 - HD2023 |
Indicator if institution is a Historically Black College or University (HBCU) |
Carnegie Classification | HD2017 - HD2023 |
Institution type (e.g., public/private, 2-year/4-year) |
Endowment Size | F1A2017 - F1A2023 |
Value of endowment assets at the start and end of the fiscal year |
Library Budget | AL2017 - AL2023 |
Total library expenditures (salaries, materials, operations) |
Table Descriptions
Institution Characteristics (HD files
)
- Description: Provides details on institution location, classification, and designation (e.g., HBCU, land-grant).
- Contents:
UNITID
: Unique institution identifierINSTNM
: Institution nameADDR, CITY, STABBR, ZIP
: Location information. Note: We restrict our sample to only colleges and unversities in the 50 US states.HBCU
: 1 = HBCU, 0 = Non-HBCUCONTROL
: 1 = Public, 2 = Private Nonprofit, 3 = Private For-ProfitICLEVEL
: 1 = 4-year, 2 = 2-year, 3 = Less than 2-year. Note: We restrict our sample to only include ICLEVEL 1 and 2 for 4-year and 2-year colleges and universities.CARNEGIE
: See table below.LANDGRNT
: 1 = Land Grant, 2 = Not Land Grantyear
: Added variable for time tracking
Carnegie Code | Description |
---|---|
-3 | Not available |
-2 | Not applicable |
15 | Associate’s Colleges: Mixed Transfer/Career & High Traditional |
16 | Associate’s Colleges: High Transfer-High Traditional |
21 | Doctoral Universities: Very High Research Activity (R1) |
22 | Doctoral Universities: High Research Activity (R2) |
31 | Master’s Colleges & Universities: Larger programs |
32 | Master’s Colleges & Universities: Medium programs |
33 | Master’s Colleges & Universities: Smaller programs |
40 | Baccalaureate Colleges: Arts & Sciences Focus |
51 | Special Focus Institutions: Health Professions |
52 | Special Focus Institutions: Engineering |
53 | Special Focus Institutions: Other Technology-Related |
54 | Special Focus Institutions: Business & Management |
55 | Special Focus Institutions: Arts, Music & Design |
56 | Special Focus Institutions: Law Schools |
57 | Special Focus Institutions: Other Fields |
58 | Tribal Colleges |
59 | Not classified |
60 | Baccalaureate/Associate’s Colleges |
Institution Size (EFFY files
)
- Description: Contains 12-month unduplicated headcount enrollment by race/ethnicicy, gender, and student level.
- Contents:
UNITID
: Unique institution identifierEFFYLEV
: Level of study (1 = All students, 2 = Undergrad, 4 = Graduate)EFYTOTLT
: Total students enrolled during the 12-month periodyear
: Manually added variable to track enrollment over time
- Processing Notes:
- Only includes EFFYLEV = 1 (All students level).
- Year variable was manually added since it was not originally present.
Endowment Data (F1A files
)
- Description: Reports institutional endowment assets of public universities at the beginning and end of the fiscal year.
- Contents:
UNITID
: Unique institution identifierF1H01
: Endowment assets at start of fiscal yearF1H02
: Endowment assets at end of fiscal yearyear
: Fiscal year
Library Expenditures (AL files
)
- Description: Tracks total library budgets, including salaries, benefits, and operations.
- Contents:
UNITID
: Unique institution identifierLEXPTOT
: Total library expendituresyear
: Reporting year
Summary of the IPEDS Data
Year | No. Institutions |
---|---|
2017 | 7153 |
2018 | 6857 |
2019 | 6559 |
2020 | 6440 |
2021 | 6289 |
2022 | 6256 |
2023 | 6163 |
Year | Private for-profit | Private not-for-profit | Public |
---|---|---|---|
2017 | 3093 | 1959 | 2069 |
2018 | 2793 | 1930 | 2077 |
2019 | 2566 | 1905 | 2056 |
2020 | 2463 | 1889 | 2036 |
2021 | 2411 | 1868 | 1994 |
2022 | 2352 | 1855 | 2019 |
2023 | 2299 | 1836 | 1999 |
“Control” is defined as Public, Private Nonprofit, and Private For-Profit.
Year | 2-year | 4-year |
---|---|---|
2017 | 1003 | 817 |
2018 | 989 | 840 |
2019 | 968 | 852 |
2020 | 949 | 852 |
2021 | 930 | 829 |
2022 | 924 | 859 |
2023 | 899 | 868 |
Year | 2-year | 4-year |
---|---|---|
2017 | 1034 | 2371 |
2018 | 917 | 2167 |
2019 | 774 | 2081 |
2020 | 736 | 2046 |
2021 | 709 | 2016 |
2022 | 681 | 2009 |
2023 | 664 | 1996 |
Data Processing and Standardization
- Variable Name Changes and Formatting:
UNITID
andyear
serve as the primary keys to merge datasets for analysis.year
was added to datasets that lacked it.
- Handling Missing Data and Filters:
- Non-relevant columns were removed.
- Datasets were filtered to retain only institutions with complete enrollment and classification data.
- Merging Strategy:
- Datasets can be joined using
UNITID
andyear
as unique identifiers. - Institutions missing
UNITID
were excluded.
- Datasets can be joined using
How to Merge with MSI Data
The IPEDS dataset can be linked with the MSI dataset using the UNITID
and year
variables. This allows for:
- Identifying MSI institutions within IPEDS to analyze institutional characteristics.
- Comparing institutional characteristics of MSI and non-MSI institutions, such as enrollment size, Carnegie classification, and financial indicators.
After merging the IPEDS-MSI data with the cleaned institutional data from the citation databases, this dataset also allows for assessing research output and dataset usage by institution type, and examining trends over time in MSI status and institutional characteristics.