Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
D IHME Methods The Institute for Health Metrics and Evaluationâs mapping of drugâuse pairs to Global Burden of Disease (GBD) categories involved the following steps: (1) We identified drug uses in the Evaluate Pharma database, covering current drugs for the top 20 pharmaceutical companies, and pipeline drugs for all companies; (2) for validation, we manually mapped drugâuse pairs to GBD conditions (causes, risk factors, impairments, injuries, or pathogens) for two companiesâ current and pipeline portfolios; (3) we then applied a large language model (LLM) to assign drugâuse pairs to GBD categories, using the manual mappings as a benchmark for optimizing our input con- figuration; (4) this highest performing LLM method was used to map the current portfolios of the top 20 pharmaceutical companies and pipeline portfolios for all companies; and (5) we compared these pharmaceutical portfolios by GBD cause to the respective disease burden. The remaining sections in this document provide additional information about each of these steps. Identification of Drugs and DrugâUse Pairs We used the Evaluate Pharma database to identify both current phar- maceutical products and pipeline pharmaceutical products. To discover all uses for each of the current drugs, we mapped drug names from the Evaluate Pharma database to reference sources (e.g., Redbook) that specify the use of each drug. For pipeline drugs, we relied on the âspecified useâ variable in the Evaluate Pharma database. 225
226 ALIGNING INVESTMENTS IN THERAPEUTIC DEVELOPMENT Manual Mapping of DrugâUse Pairs to Create a Validation Dataset To assess and optimize the performance of the LLM-based mapping, we created a validation dataset from Pfizer and Sanofiâs current and pipe- line drug portfolios. Two independent coders mapped each drugâuse pair to GBD causes, risk, and injury codes, with a third reviewer resolving any discrepancies. We also compared LLM-based assignments to manual map- pings to refine the validation dataset. In addition to causes, other entities were included as options for mapping. The final mapping included 334 causes, 47 injury codes, 18 noncause groupings, 4 risk factors, and the heart failure impairment. Performance Optimization of an LLM-Based Classification We supplied the LLM with drugâuse pairs and a list of GBD conditions, instructing it to identify the most relevant condition. We refined the prompt to enhance accuracy, using our validation set to evaluate improvements. We also tested different foundational models, including GPT4, o1-mini, and o1-preview. In addition to prompt refinement, we undertook a range of performance optimization approaches. These included the provision of condition keywords generated through a separate LLM process and an adjudication process, whereby we used multiple LLM instances, each with its own medical specialty focus, with a final LLM instance determining the most likely condition assignment. The table below describes concordance between different LLM approaches that vary according to the foundational model used, whether condition keywords were provided to the LLM, and whether an adjudication Level 1 Cause Level 2 Cause Level 3 Cause Level 4 Cause o1-preview with keywords, 98.5% 96.0% 93.9% 92.8% adjudicated o1-preview with 98.3% 95.3% 93.0% 93.0% keywords o1-preview without 97.0% 91.8% 84.8% 83.8% keywords o1-mini without 97.1% 90.5% 83.5% 85.7% keywords o1-mini with 97.3% 91.6% 86.5% 91.7% keywords GPT-4 with 95.3% 87.5% 80.1% 85.6% keywords
APPENDIX D 227 process was used. We evaluated concordance at the four levels of the GBD cause hierarchy, with higher levels indicating greater granularity. The high- est performing approach was one that uses the o1-preview foundational LLM, condition keywords, and adjudication (limited to instances where the initial classification by the LLM had a confidence level less than or equal to 80 percent). Application of the Optimized LLM Approach and Postprocessing Using Evaluate Pharma, we extracted the most recent product data as of February 2025. We then applied our most accurate LLM method for classifying the complete dataset, which includes over 7,000 current and pipeline products from the top 20 companies and over 37,000 additional pipeline products from other companies. Some adjustments were made to the LLM outputs. Specifically, for a small number of cases where the LLMâs assignments did not match any valid condition in our hierarchy, we manu- ally mapped the drugâuse pairs to the correct condition. Comparison of Pharmaceutical Portfolios by Cause Against the Corresponding Disease Burden This analysis encompassed pharmaceutical products globally, both on- market and in development. Comparison of findings to disease burden was made for current drugs to 2021 disease burden and for pipeline drugs to 2030 forecasted disease burden, as defined by GBD 2021.