Identifying Minority Serving Institutions (MSIs)

Published

March 30, 2025

Overview

The Minority-Serving Institutions (MSI) Data captures institutions eligible for federal MSI programs, as determined by the U.S. Department of Education and other sources. In this project, MSI data is used to analyze institutional characteristics in relation to research publications identified in Scopus, OpenAlex, and Dimensions.

This dataset provides information on institutions eligible or potentially eligible for at least one MSI designation and includes data from 2017 to 2023. The MSI dataset integrates information from multiple sources and is merged with IPEDS data to track research output across different institution types.

How These Files Fit into the Project Workflow

The MSI dataset are used to:

  1. Identify MSI-designated institutions within Scopus, OpenAlex, and Dimensions.
  2. Compare research output across MSI and non-MSI institutions.
  3. Analyze dataset usage at institutions serving underrepresented student populations.

By integrating MSI eligibility data with publication datasets, we can assess whether MSI institutions are underrepresented in academic research databases.

File Organization in GitHub Repository

Category File Path & Link
Processed MSI Dataset compare_scopus_openalex/resources/ipeds/msi.csv
Raw MSI Data compare_scopus_openalex/resources/ipeds
Data Processing Code compare_scopus_openalex/resources/documentation/IPEDSdata.rmd
Data Documentation compare_scopus_openalex/resources/documentation/IPEDS_Data.md
Python Script for PDF Processing compare_scopus_openalex/resources/documentation/MSI_after2020.py

Data Sources

The MSI dataset combines data from multiple sources:

Years Source Link
2017-2021 MSI Data Project (Nguyen et al., 2023)1 Link
2022 Rutgers CMSI2 PDF
2023 Rutgers CMSI3 PDF

File Descriptions

2017-2023 MSI Data (msi.csv)

  • Description: Contains MSI eligibility and designation data for institutions from 2017 to 2021.
  • Contents:
    • UNITID: Unique identification number of the institution
    • opeid: Office of Postsecondary Education (OPE) ID Number. Only available for 2017-2021 in the MSI data.
    • instname: Name of the institution
    • year: The year in which the endowment size was recorded
    • level: 2-year/4-year
    • control: public, private
    • nonprofit: 1 = private not-for-profit, 0 otherwise
    • forprofit: 1 = private for-profit, 0 otherwise
    • msi_type_hbcu: 1=yes, 0=no
    • msi_type_aanapisi: 1=yes, 0=no
    • msi_type_annhsi: 1=yes, 0=no
    • msi_type_hsi: 1=yes, 0=no
    • msi_type_nasnti: 1=yes, 0=no
    • msi_type_tccu: 1=yes, 0=no
    • msi_type_pbi: 1=yes, 0=no
    • sum: sum of previous msi type columns
    • fund: funded under at least one program (0 or 1)
  • Processing Notes:
    • Python was used to convert PDFs to Excel (MSI_after2020.py).
    • Data was cleaned and standardized to match the 2017-2021 format.
    • The data from 2017-2021 identifies 11 distinct MSI designations:
      • Alaska Native and Native Hawaiian-Serving Institutions (ANNHSI)
      • Asian American and Native American Pacific Islander-Serving Institutions (AANAPISI)
      • Hispanic-Serving Institutions (HSI)
      • Hispanic-Serving Institutions STEM (HSI-STEM)
      • Hispanic-Serving Institutions Promoting Postbaccalaureate Opportunities for Hispanic Americans (HSI-PPOHA)
      • Historically Black Colleges and Universities (HBCU)
      • Historically Black Colleges and Universities Graduate Institutions (HBGI)
      • Historically Black Colleges and Universities Masters Institutions (HBCU-Masters)
      • Native American-Serving Nontribal Institutions (NASNTI)
      • Predominantly Black Institutions (PBI)
      • Tribally Controlled Colleges and Universities (TCCU)

Summary of the MSI data

Year Total MSI % Not_MSI %
2017 5242 8.13 91.87
2018 5242 7.69 92.31
2019 5242 7.84 92.16
2020 5242 7.80 92.20
2021 5242 8.81 91.19
2022 5242 16.73 83.27
2023 5242 16.58 83.42
Year Total % MSI % Not MSI
2017 1837 5.66 94.34
2018 1837 5.72 94.28
2019 1837 5.72 94.28
2020 1837 5.77 94.23
2021 1837 5.88 94.12
2022 1837 14.81 85.19
2023 1837 14.81 85.19
Year Total % MSI % Not MSI
2017 1409 0.00 100.00
2018 1409 0.00 100.00
2019 1409 0.00 100.00
2020 1409 0.00 100.00
2021 1409 0.00 100.00
2022 1409 0.14 99.86
2023 1409 0.21 99.79
Year Total % MSI % Not MSI
2017 26 0.00 100.00
2018 26 0.00 100.00
2019 26 0.00 100.00
2020 26 0.00 100.00
2021 26 0.00 100.00
2022 26 3.85 96.15
2023 26 3.85 96.15
Year Total % MSI % Not MSI
2017 1678 19.18 80.81
2018 1678 17.76 82.24
2019 1678 18.24 81.76
2020 1678 18.06 81.94
2021 1678 21.10 78.90
2022 1678 33.86 66.15
2023 1678 35.34 64.66
Year Total % MSI % Not MSI
2017 933 19.72 80.28
2018 933 16.93 83.07
2019 933 17.04 82.96
2020 933 17.15 82.85
2021 933 20.15 79.85
2022 933 36.66 63.34
2023 933 35.05 64.95
Year Total % MSI % Not MSI
2017 807 17.10 82.90
2018 807 17.35 82.65
2019 807 18.22 81.78
2020 807 17.82 82.18
2021 807 20.57 79.43
2022 807 32.09 67.91
2023 807 32.96 67.04
Year Total % MSI % Not MSI
2017 139 2.88 97.12
2018 139 2.88 97.12
2019 139 2.16 97.84
2020 139 2.16 97.84
2021 139 2.88 97.12
2022 139 12.23 87.77
2023 139 12.23 87.77
Year Total % MSI % Not MSI
2017 1714 5.83 94.17
2018 1714 5.84 94.16
2019 1714 5.95 94.05
2020 1714 6.01 93.99
2021 1714 6.07 93.93
2022 1714 14.88 85.12
2023 1714 14.88 85.12

Data Processing and Standardization

  • Variable Name Changes and Formatting:
    • year was added to track MSI status over time.
    • UNITID is the primary key to merge MSI and IPEDS datasets.
  • Handling Missing Data and Filters:
    • Institutions without UNITID were excluded.
    • Non-relevant columns were removed.
  • Merging Strategy:
    • The 2017-2021 MSI data was directly compatible with IPEDS.
    • The 2022-2023 MSI data required additional formatting before merging.

How to Merge with IPEDS Data

The MSI dataset can be linked with IPEDS data using the UNITID and year variables. This allows for:

  • Tracking institutional MSI status over time.
  • Comparing MSI and non-MSI institutions within Scopus, OpenAlex, and Dimensions.
  • Assessing dataset usage patterns across different institution types.

Footnotes

  1. The 2017-2021 MSI data was sourced from the Minority Serving Institutions (MSI) Data Project by Nguyen et al. (2023), which merges the U.S. Department of Education’s MSI eligibility metrics (2017-2021) with IPEDS data.↩︎

  2. The 2022-2023 MSI data was obtained from the Rutgers Graduate School of Education, which maintains annual MSI eligibility lists.↩︎

  3. The 2022-2023 MSI data was obtained from the Rutgers Graduate School of Education, which maintains annual MSI eligibility lists.↩︎