Appendix G
Analysis of the NIH Autoimmune Disease Research Grant Portfolio: Methodology
BACKGROUND
Congress tasked the Committee for the Assessment of NIH Research on Autoimmune Disease to assess NIH research activities on autoimmune disease. The committee responded to this task, in part, by describing NIH investments or spending on autoimmune disease research grant activities. The committee used publically available NIH data sources, primarily the Research, Condition, and Disease Categorization (RCDC) system and the RePORTER database, to conduct its analysis. They are described below, along with the methods used to describe different aspects of the research portfolio (spending on autoimmune disease research by NIH and Institute and Centers (ICs); spending on autoimmune disease research by research activity; spending on autoimmune disease research by disease; IC collaborations; clinical trials, publications, and patents associated with autoimmune disease research grants; and an analysis of study sections reviewing autoimmune disease grants).
RESEARCH, CONDITION, AND DISEASE CATEGORIZATION SYSTEM
The NIH Reform Act of 2006 required NIH to prioritize consistency and transparency by creating a tool to categorize the agency’s funded research. Implemented in 2008, the RCDC system “uses sophisticated text data mining (categorizing and clustering using words and multiword
phrases) in conjunction with NIH-wide definitions used to match projects to categories.”1 RCDC compiles a list of all the funded grants and contracts that fit into the specific categories and includes details such as funding amounts and other grant characteristics.2 These categories are referred to as “spending categories.”
The RCDC categorization process enables NIH to apply the latest technology to consistently report on how America’s tax dollars are spent to support medical research.3 A new spending category may be added to the system based on scientific need and/or at the request of an external source such as patients, advocacy groups, or Congress. Spending categories can also be added at the request of the Director of NIH or an IC. As of June 2021, there are 299 research/disease areas in RCDC, including the Autoimmune Disease spending category of interest to the committee. Since RCDC was implemented in 2008, research has only been categorized since then. At the time of the committee’s work, the most recent data available were for fiscal year (FY) 2020. Thus, the timeframe for the Committee’s analysis is from FY 2008 to FY 2020.
RePORTER
RePORTER is a publically available electronic database that “allows users to search a repository of both intramural and extramural NIH-funded research projects from the past 25 years and access publications since 1980, and patents resulting from NIH funding.”4 RePORTER draws from a number of internal and external databases to compile information associated with research projects. These include the electronic research administration (eRA) databases, Medline, PubMed Central, the NIH Intramural Database, and Interagency Edison (iEdison) database that contains grantee reporting on inventions and patents.5 RePORTER is a dynamic database that is updated weekly. Updates include the addition of new projects as well as revisions to prior awards, which may prevent an exact replication of the data used in this report.
RCDC and RePORTER have unique purposes thus they do not share all of the same information or variables. Further, similar information or variables in RCDC and RePORTER may differ as they are published on
___________________
1https://report.nih.gov/funding/categori-cal-spending#/, https://report.nih.gov/funding/categorical-spending/rcdc-process (accessed December 29, 2021).
2https://report.nih.gov/funding/categorical-spending/rcdc-process (accessed December 29, 2021).
3https://report.nih.gov/funding/categorical-spending/rcdc (accessed January 4, 2022).
4https://report.nih.gov/faqs (accessed December 29, 2021).
5https://report.nih.gov/faqs (accessed December 29, 2021).
different timelines and have varying budget nuances, thus these data systems should not be directly compared. Acknowledging these nuances, the committee drew upon information from both data sources in order to adequately analyze autoimmune disease research funding.
Autoimmune Disease Spending Methodology
The committee used the NIH RePORTER database to evaluate funding and other aspects of the NIH autoimmune disease research portfolio. The RePORTER database allows users to query or search by the spending categories specified by RCDC (presented as “NIH Spending Category” in the RePORTER advanced projects search). The committee focused its analysis on the autoimmune spending category.
The committee conducted the following search in RePORTER:
Advanced Projects Search:
Fiscal Year:
- 2008, 2009, 2010, 2011, 2012, 2013
- 2014, 2015, 2016, 2017, 2018, 2019, 2020
Project Details:
- NIH Spending Category: Autoimmune Disease
- Agency/Institute/Center: NIH
Once the search results were generated, they were exported into Excel along with optional variables (project description, project details, personnel, funded organization, project funding). A maximum of 15,000 projects can be downloaded into Excel at a time, thus, four combinations of the above search criteria were exported. Two exported datasets did not include information on the ICs that funded the grants (FY 2008–2013 and FY 2014–2020); these datasets were combined and used to conduct NIH level analyses. The two datasets (2008–2013 and 2014–2020) containing information on funding ICs were also combined and used to conduct IC level analyses. When a research project application is funded by more than one IC, it is listed once for each IC. For example, if a grant application was co-funded by two ICs it will appear in the dataset twice.
The two datasets described above were used as the source for the majority of the committee’s analyses and are referred to as the Committee RePORTER Datasets (CRD).
The following figures and tables that describe spending on autoimmune disease research were prepared using the datasets.
This methodology is associated with the analyses in Figures 6-1 and 6-2 and Tables 6-1 and 6-2.
Categorization of Grants by Autoimmune Diseases of Interest
For a number of analyses the committee was interested in reporting data by the autoimmune diseases they chose to focus on. To carry out those analyses, CRD data were summarized or rolled up into “grants.” An explanation of the method used to do this follows.
Exported RePORTER datasets contain a row of information for each funded grant application, including the total dollar amount awarded for each year. For example, a 5-year grant awarded between 2008 and 2020 will appear as five separate rows with the same grant number, though there may be variation among project descriptions, details, and funding in the dataset. When the CRD data set was exported, there were 28,148 rows of funded grant applications. To review the data at the grant level (as opposed to the yearly grant application level), grant applications were aggregated by the project or grant number; this resulted in a total of 8,470 grants. Grants were then categorized by disease, using RCDC spending categories (based on the committee’s disease selection, RCDC categories are available for inflammatory bowel disease, multiple sclerosis, psoriasis, rheumatoid arthritis, and systemic lupus erythematosus), and using MeSH terms6 and natural language processing software for diseases without a RCDC category (antiphospholipid syndrome, autoimmune thyroid disease, celiac disease, primary biliary cholangitis, Sjögren’s, type 1 diabetes). Hashimoto’s disease and Graves’ disease were combined into a single “autoimmune thyroid disease” category. Grants can be counted in more than one disease category. Grants that were not associated with any specific disease categories were labeled as “other autoimmune disease.” This methodology applies to all tables and figures that provide analyses by the committee’s autoimmune diseases of interest.
Creation of New Dataset Variables
Two new variables were created to support specific committee analyses related to the type of research funded, and to provide a more complete estimate of the total cost of autoimmune disease research funding. These new variables are described below.
Autoimmune Disease Spending by Research Type
The committee sought to analyze NIH’s autoimmune disease research portfolio by type of research (investigator-initiated, solicited,
___________________
6 MeSH terms are produced by the National Library of Medicine and are used for indexing, cataloging, and searching of biomedical and health-related information https://www.nlm.nih.gov/mesh/meshhome.html, accessed December 31, 2021).
and intramural research) and funding. This analysis provides insights into how NIH allocates its funds across these research activities and trends over time. To identify research type, Funding Opportunity Announcement (FOA)7 numbers were examined. Within each CRD, a new variable “Type of Research” was created.
- If the FOA number was a Request for Application (RFA) or Request for Proposal (RFP), it was categorized as solicited research.
- If the FOA number was a Program Announcement (PA), it was categorized as investigator-initiated research.
- If the Activity Code8 started with Z, it was categorized as intramural research.
- If the FOA number was blank, it was categorized as Unknown.
If the type of research is Unknown, it was not included in the Type of Research analysis. The FOA field may be blank in some cases such as non-grant records (contracts and intramural records), and older grant records from a time when it was not required to submit a grant in response to a FOA. Because intramural research does not have an FOA, the designated intramural activity code (Z) was used to determine the type of research.
The new variable, Type of Research, is used in the analyses associated with Figure 6-9 and Table 6-7.
Combined Total Cost of Funded Autoimmune Research
The committee noted differences in funding information available in RCDC and RePORTER early in its analysis of spending on autoimmune disease research. Due to the different purposes of the data systems, neither system provides a complete estimate of the total cost of autoimmune disease research funding by IC. The committee created a new variable, Combined Total Cost of Funded Autoimmune Research to address this issue and added it to the CRD.
There are two types of projects found in the CRD, projects that stand alone and subprojects. A subproject is a “discrete and clearly identifiable
___________________
7 An FOA number for solicited research contains the type of FOA, NIH funding Institute code, fiscal year, and associated serial number (e.g., RFA-TR-21-101). An FOA number for investigator-initiated research contains the type of FOA, fiscal year, and associated serial number (e.g., PAR-21-045). https://grants.nih.gov/grants/guide/description.htm, https://grants.nih.gov/grants/guide/parent_announcements.htm; https://grants.nih.gov/funding/searchguide/index.html#.
8 An Activity Code is a 3-chatacter code used to identify a specific category of extramural activity to differentiate the wide variety of research-related programs NIH supports. https://grants.nih.gov/grants/glossary.htm.
segment of a multicomponent application…most commonly subprojects are part of the M, P, S, and U mechanisms.”9 Stand-alone projects have a designated funding IC(s) in the RePORTER database, but subprojects do not; subproject funding by IC is only available as part of RCDC reporting. The committee needed this information in order to determine how much funding each IC contributed to autoimmune disease research. Out of necessity, the committee used RCDC data to identify subproject funding by IC. RCDC datasets for each FY 2008 to 2020 were downloaded and used to inform subproject funding IC and to configure total dollar amounts in the CRD. Generally, RCDC and RePORTER should not be directly compared as they serve different purposes, are published on different timelines, and have varying budget nuances. However, funding IC dollar amounts would be significantly underestimated without utilizing both data sources. RCDC dollar amounts are frozen and may not always match the frequently updated data in RePORTER. Additionally, RePORTER separates Total Cost for projects and Total Cost for subprojects, making it challenging to determine the total amount of dollars any one IC spent on autoimmune disease research. A new variable, “Combined Total Cost of Autoimmune Disease Research,” was created to reflect consolidated funding for stand-alone projects and subprojects in one variable; it was added to the CRD.
The new variable, Combined Total Cost of Funded Autoimmune Disease Research, is used in the analyses associated with Figure 6-1 and Tables 6-1, 6-2, and 6-4.
General Data Limitations
NIH does not expressly budget by RCDC spending category. The annual spending estimates reflect amounts that change as a result of science, actual research projects funded, and the NIH budget. Further, the spending categories are not mutually exclusive. Research projects can
___________________
9 The M activity code, also known as general clinical research centers program, is an award made to an institution solely for the support of a General Clinical Research Center where scientists conduct studies on a wide range of human diseases using the full spectrum of the biomedical sciences. The P activity code, also known as program/project center grants, are large, multidisciplinary and long-term research efforts that generally include a diverse array of research activities. The S activity code, also known as research related programs and/or projects, includes grants for a wide range of activities including minority biomedical research support and biomedical research support shared instrumentation grants. The U activity code, also known as cooperative agreements, are a support mechanism frequently used for complex, high-priority research areas that require substantial involvement from NIH program or scientific staff (called cooperative agreements). https://grants.nih.gov/grants/funding/ac_search_results.htm, https://grants.nih.gov/grants/glossary.htm.
be included in multiple spending categories so amounts may add up to more than 100 percent of NIH-funded research. Additionally, the specific amounts associated with multiple spending categories are not specified. For example, hypothetical Grant A with a total cost of $100,000 has three associated spending categories: autoimmune disease, cancer, cardiovascular. Although the data do not provide information on the percentage of dollars allocated to each of the spending categories (because the categories are defined irrespective of the budget), the total amount is considered to be dollars that supported autoimmune disease research (the autoimmune disease spending category).
Another limitation is that NIH actual total obligations reported by the Office of the Budget are not provided for autoimmune disease research. Because these dollar amounts come from exclusive sources, the committee was unable to directly compare them. Despite the nuances in data sources, the committee was able to gauge the percentage of autoimmune disease spending across NIH compared to NIH actual total obligations as seen in the analysis associated with Figure 6-1 and Table 6-1.
Methods for Additional Specific Analyses
IC Collaboration
An Excel pivot table was created within the CRD containing funding ICs from FY 2008-2020 to examine the relationship between the IC administrating a grant and ICs that contribute funding to the grant. An administrative IC with at least one funding IC was considered a collaboration, also known as joint IC funding. To determine the average number of joint IC funding collaborations over the period, the number of collaborations were summed and divided by 13 (years). The range of joint IC funding collaborations per year included the least amount of collaborations in a given year within the committee’s specified time period, and the most amount of collaborations in a given year within the time period, by IC. This methodology was used in the analyses associated with Table 6-8.
Study Section Analysis
The committee used the online CSR study section tool available at https://public.csr.nih.gov/StudySections.10 The committee entered the name of each of the select autoimmune diseases of committee interest into the study section search field to identify the names of its associated chartered study sections. Only chartered study sections were included
___________________
10 Accessed November 24, 2021.
because they review most investigator-initiated research grant applications. The total number of chartered study sections for all select diseases (de-duplicated) was then tallied.
The committee used the RePORTER Matchmaker search tool available at https://reporter.nih.gov/matchmaker.11 The committee entered the name of each of the select autoimmune diseases of committee interest into the field to yield the number of projects associated with that disease. The user then activated the Active Projects feature and selected 2019 and 2020 in the Fiscal Year dropdown. RePORTER then generated a graph of the study sections that had reviewed the most grants for that autoimmune disease. The study sections appearing in the graph, and the number of grants each study section reviewed, constitute the data included in Table 6-10.
Clinical trials
Using clinicaltrials.gov, 637 clinical trials were identified and exported. Clinical trials were then linked to the 8,470 autoimmune disease grants of interest using a matching algorithm. The National Academies staff then manually reviewed the clinical trials to ensure that they were related to the committee’s autoimmune diseases of interest or the other autoimmune disease category. This resulted in a total of 353 clinical trials used in the analysis. This methodology was used in the analyses associated with Figure 6-26.
Publications
To determine the publications associated with the autoimmune disease grants of interest, a variety of NIH and public databases were used, including PubMed, Scientific Publication Information Retrieval and Evaluation System (SPIRES) bibliometric tools, and Web of Science (WoS). SPIRES linked peer-reviewed publications to the autoimmune disease grants of interest. Disease information for publications is based on the disease categorization of associated grants. To characterize scientific productivity and impact, the list of related publications associated with the grants of interest were analyzed using the following metrics: total number of publications, number of publications by disease type, and publication distribution over time from publication year 2008 to 2020. Relative Citation Ratios (RCRs) were retrieved from iCite, a tool created by the Office of Portfolio Analysis within NIH, to access bibliometrics for relevant publications. Journal impact factor was obtained from WoS for the articles
___________________
11 Accessed December 3, 2021.
publication year. This methodology was used in the analyses associated with Figures 6-27, 6-28, and 6-29, and Table 6-12.
Patents
Patent data (patent and patent applications) were gathered from the U.S. Patent Databases (USPTO) and RePORTER. Patent data was linked to 8,470 autoimmune disease grants of interest using a matching. All linked data are in the public domain. Disease information for patent applications and patents is based on the disease categorization of associated grants. This methodology was used in the analyses associated with Figures 6-30 and 6-31.
This page intentionally left blank.