Overview
The Minority-Serving Institutions (MSI) Data captures institutions eligible for federal MSI programs, as determined by the U.S. Department of Education and other sources. In this project, MSI data is used to analyze institutional characteristics in relation to research publications identified in Scopus, OpenAlex, and Dimensions.
This dataset provides information on institutions eligible or potentially eligible for at least one MSI designation and includes data from 2017 to 2023. The MSI dataset integrates information from multiple sources and is merged with IPEDS data to track research output across different institution types.
How These Files Fit into the Project Workflow
The MSI dataset are used to:
- Identify MSI-designated institutions within Scopus, OpenAlex, and Dimensions.
- Compare research output across MSI and non-MSI institutions.
- Analyze dataset usage at institutions serving underrepresented student populations.
By integrating MSI eligibility data with publication datasets, we can assess whether MSI institutions are underrepresented in academic research databases.
File Organization in GitHub Repository
Data Sources
The MSI dataset combines data from multiple sources:
2017-2021 |
MSI Data Project (Nguyen et al., 2023) |
Link |
2022 |
Rutgers CMSI |
PDF |
2023 |
Rutgers CMSI |
PDF |
File Descriptions
2017-2023 MSI Data (msi.csv
)
- Description: Contains MSI eligibility and designation data for institutions from 2017 to 2021.
- Contents:
UNITID
: Unique identification number of the institution
opeid
: Office of Postsecondary Education (OPE) ID Number. Only available for 2017-2021 in the MSI data.
instname
: Name of the institution
year
: The year in which the endowment size was recorded
level
: 2-year/4-year
control
: public, private
nonprofit
: 1 = private not-for-profit, 0 otherwise
forprofit
: 1 = private for-profit, 0 otherwise
msi_type_hbcu
: 1=yes, 0=no
msi_type_aanapisi
: 1=yes, 0=no
msi_type_annhsi
: 1=yes, 0=no
msi_type_hsi
: 1=yes, 0=no
msi_type_nasnti
: 1=yes, 0=no
msi_type_tccu
: 1=yes, 0=no
msi_type_pbi
: 1=yes, 0=no
sum
: sum of previous msi type columns
fund
: funded under at least one program (0 or 1)
- Processing Notes:
- Python was used to convert PDFs to Excel (
MSI_after2020.py
).
- Data was cleaned and standardized to match the 2017-2021 format.
- The data from 2017-2021 identifies 11 distinct MSI designations:
- Alaska Native and Native Hawaiian-Serving Institutions (ANNHSI)
- Asian American and Native American Pacific Islander-Serving Institutions (AANAPISI)
- Hispanic-Serving Institutions (HSI)
- Hispanic-Serving Institutions STEM (HSI-STEM)
- Hispanic-Serving Institutions Promoting Postbaccalaureate Opportunities for Hispanic Americans (HSI-PPOHA)
- Historically Black Colleges and Universities (HBCU)
- Historically Black Colleges and Universities Graduate Institutions (HBGI)
- Historically Black Colleges and Universities Masters Institutions (HBCU-Masters)
- Native American-Serving Nontribal Institutions (NASNTI)
- Predominantly Black Institutions (PBI)
- Tribally Controlled Colleges and Universities (TCCU)
Summary of the MSI data
2017 |
5242 |
8.13 |
91.87 |
2018 |
5242 |
7.69 |
92.31 |
2019 |
5242 |
7.84 |
92.16 |
2020 |
5242 |
7.80 |
92.20 |
2021 |
5242 |
8.81 |
91.19 |
2022 |
5242 |
16.73 |
83.27 |
2023 |
5242 |
16.58 |
83.42 |
2017 |
1837 |
5.66 |
94.34 |
2018 |
1837 |
5.72 |
94.28 |
2019 |
1837 |
5.72 |
94.28 |
2020 |
1837 |
5.77 |
94.23 |
2021 |
1837 |
5.88 |
94.12 |
2022 |
1837 |
14.81 |
85.19 |
2023 |
1837 |
14.81 |
85.19 |
2017 |
1409 |
0.00 |
100.00 |
2018 |
1409 |
0.00 |
100.00 |
2019 |
1409 |
0.00 |
100.00 |
2020 |
1409 |
0.00 |
100.00 |
2021 |
1409 |
0.00 |
100.00 |
2022 |
1409 |
0.14 |
99.86 |
2023 |
1409 |
0.21 |
99.79 |
2017 |
26 |
0.00 |
100.00 |
2018 |
26 |
0.00 |
100.00 |
2019 |
26 |
0.00 |
100.00 |
2020 |
26 |
0.00 |
100.00 |
2021 |
26 |
0.00 |
100.00 |
2022 |
26 |
3.85 |
96.15 |
2023 |
26 |
3.85 |
96.15 |
2017 |
1678 |
19.18 |
80.81 |
2018 |
1678 |
17.76 |
82.24 |
2019 |
1678 |
18.24 |
81.76 |
2020 |
1678 |
18.06 |
81.94 |
2021 |
1678 |
21.10 |
78.90 |
2022 |
1678 |
33.86 |
66.15 |
2023 |
1678 |
35.34 |
64.66 |
2017 |
933 |
19.72 |
80.28 |
2018 |
933 |
16.93 |
83.07 |
2019 |
933 |
17.04 |
82.96 |
2020 |
933 |
17.15 |
82.85 |
2021 |
933 |
20.15 |
79.85 |
2022 |
933 |
36.66 |
63.34 |
2023 |
933 |
35.05 |
64.95 |
2017 |
807 |
17.10 |
82.90 |
2018 |
807 |
17.35 |
82.65 |
2019 |
807 |
18.22 |
81.78 |
2020 |
807 |
17.82 |
82.18 |
2021 |
807 |
20.57 |
79.43 |
2022 |
807 |
32.09 |
67.91 |
2023 |
807 |
32.96 |
67.04 |
2017 |
139 |
2.88 |
97.12 |
2018 |
139 |
2.88 |
97.12 |
2019 |
139 |
2.16 |
97.84 |
2020 |
139 |
2.16 |
97.84 |
2021 |
139 |
2.88 |
97.12 |
2022 |
139 |
12.23 |
87.77 |
2023 |
139 |
12.23 |
87.77 |
2017 |
1714 |
5.83 |
94.17 |
2018 |
1714 |
5.84 |
94.16 |
2019 |
1714 |
5.95 |
94.05 |
2020 |
1714 |
6.01 |
93.99 |
2021 |
1714 |
6.07 |
93.93 |
2022 |
1714 |
14.88 |
85.12 |
2023 |
1714 |
14.88 |
85.12 |
Data Processing and Standardization
- Variable Name Changes and Formatting:
year
was added to track MSI status over time.
UNITID
is the primary key to merge MSI and IPEDS datasets.
- Handling Missing Data and Filters:
- Institutions without
UNITID
were excluded.
- Non-relevant columns were removed.
- Merging Strategy:
- The 2017-2021 MSI data was directly compatible with IPEDS.
- The 2022-2023 MSI data required additional formatting before merging.
How to Merge with IPEDS Data
The MSI dataset can be linked with IPEDS data using the UNITID
and year
variables. This allows for:
- Tracking institutional MSI status over time.
- Comparing MSI and non-MSI institutions within Scopus, OpenAlex, and Dimensions.
- Assessing dataset usage patterns across different institution types.