Disparities in health and health care across racial, ethnic, and socioeconomic backgrounds in the United States are well documented. The reasons for these disparities are, however, not well understood. Considerable interest in better understanding the causes of these differences has called attention to the availability and quality of individual-level data on race, ethnicity, socioeconomic position (SEP) and acculturation and language (e.g., language use, place of birth, generation status) of individuals. These data are critical to documenting the nature of disparities in health care and to developing strategies to eliminate disparities.
Data currently available on race, ethnicity, SEP, and acculturation and language use are severely limited. While national-level surveys sponsored or conducted by the federal government collect rich information on individuals, their health, and their use of health care, sample sizes often limit their usefulness to only broad racial and ethnic groups (e.g., blacks, whites, and Hispanics) and are typically too small for analyses within racial and ethnic groups (e.g., within the Hispanic ethnic category—Puerto Ricans, Cubans, Mexicans, and other Hispanic groups) or for smaller, but still broad, racial and ethnic groups (e.g., American Indians and Alaska Natives). Data from Medicare claims and enrollment files have been widely used for analysis of racial and ethnic disparities. However, racial and ethnic data in these files are of limited accuracy, completeness, and detail. State-based data, such as vital records, administrative data from Medicaid and the State Children’s Health Insurance Program, and data from registry systems, are potentially valuable sources of data for analyzing disparities in health and health care.
However, these data sources do not collect data on race and ethnicity in standardized ways, and they contain little information on other relevant patient characteristics. Finally, although much information on health and health care comes from private data systems maintained by health insurance plans, hospitals, and medical groups, data on race and ethnicity usually are not collected in these record systems. When the information is available, it is often unstandardized and contains little information on patients’ socioeconomic characteristics or acculturation and language use. The lack of standardized and complete data challenges the establishment of reliable baseline and trend analyses of health, health care access, cost, and quality by patient characteristics.
Concerns about the adequacy of the current infrastructure to provide the necessary data to understand and eliminate racial and ethnic disparities prompted Congress to direct the Department of Health and Human Services (DHHS) to request that the National Academies conduct a comprehensive study of DHHS data collection systems (P.L. 106-525, 2000). In response to this request, the DHHS Office of the Assistant Secretary for Planning and Evaluation (ASPE), on behalf of a number of agencies within DHHS, asked the Committee on National Statistics (CNSTAT) of the National Academies to convene a panel of experts to review DHHS data systems.1 ASPE and CNSTAT developed the charge for the study based on this legislation and on the department’s own needs for review of its data systems, giving the panel the flexibility to review related data needs as they arose.
The panel was charged to review data collection or reporting systems required under the department’s programs or activities relating to the collection of data on race, ethnicity, and socioeconomic position. The charge included examining data collection systems in other federal agencies with which the department interacts to collect relevant data on race and ethnicity (such as that of the Social Security Administration [SSA]), as well as systems of the private health care sector. The panel was asked: (1) to identify the data needed to support efforts to evaluate the effects of socioeconomic position, race, and ethnicity on access to health care and on disparities in health as well as to enforce existing protections for equal access to heath care; (2) to assess the effectiveness of the data systems and collection practices of DHHS and of selected systems and practices of other federal, state, and local agencies and the private sector in collecting and analyzing such
data; and (3) to identify critical gaps in the data and suggest ways in which they could be filled.
We note some specific distinctions the panel made in interpreting its charge. First, the panel reviewed a very broad set of data collection systems both within and outside DHHS. These systems include health surveys, administrative records, and records from private data systems. The research purposes and uses of these data collection systems are quite varied—some are used to understand broad determinants of health (e.g., the effect of income on mortality) while others are used to understand very specific outcomes of health care treatment (e.g., the effects of ethnicity and race on medical outcomes of patients with hypertension or diabetes). The panel focused only on the collection of data on race, ethnicity, and socioeconomic position (as the originating legislation called for), and added to that the collection of data on acculturation and language use because the panel believed these to be important correlates to understanding racial, ethnic, and socioeconomic aspects of health and health care. In making recommendations, the panel did not consider specific assessments of the cost of improved data collection but did broadly consider the costs of data collection among different types of data collection systems.
THE IMPORTANCE OF DATA ON RACE, ETHNICITY, SOCIOECONOMIC POSITION, AND ACCULTURATION AND LANGUAGE USE
High-quality data on race and ethnicity are necessary to identify and eliminate disparities in health and health care. Socioeconomic position (SEP)—income, wealth, and education—is important as both a mediator of racial and ethnic disparities and a further source of disparities. Low SEP, for example, is associated with limited access to the health care system, inadequate health information, and poor health practices. Acculturation (and its proxy measures language, place of birth, years in the United States, or generational status) is also related to health status; mismatches between the language spoken by health care providers and by patients can be a limiting factor in health care interactions and health information exchange. The panel therefore concluded that:
CONCLUSION 3-1: Measures of race and ethnicity should be obtained in all health and health care data systems.
CONCLUSION 3-2: Measures of socioeconomic position should, where feasible, be obtained along with data on race and ethnicity.
CONCLUSION 3-3: Measures of acculturation and proxies such as language use, place of birth, and generation and time in the United States should, where feasible, be obtained.
To monitor trends in disparities, to understand how disparities arise, and ultimately to design interventions to eliminate and reduce them, information about individuals is used to make general statistical inferences about populations. Such statistical uses are distinct from other uses of the data that require information about a specific individual. For example, income data on individuals applying for Medicaid are collected to assess eligibility for the program. In this example, data on individuals are collected to take action regarding a specific individual. In contrast, data on individuals used for statistical purposes are collected to make inferences at an aggregate level.
Many of the data used to understand health disparities are not collected specifically for these statistical purposes, but rather are used to administer services and programs. Their use for statistical purposes is secondary. The panel, in this report, will make recommendations that encourage the collection of additional items of race and ethnicity, SEP, and language and acculturation where possible so that statistical inferences about disparities can be made. But it does so with the recognition that these data need to be useful to the federal, state, and private institutions and systems for which they are collected.
CONCLUSION 3-4: Health and health care data collection systems should return useful information to the institutions and local and state government units that provide the data.
Data linkages, or bringing together variables from two or more data sets, can facilitate new analyses (for policymaking, quality improvement, and research) without the expense and time needed for additional data collection. While there are tremendous opportunities for new analyses with linked data, barriers to data linkage—confidentiality concerns and negotiating linkages across different agencies each with their own protection rules, for example—are substantial. However, methods such as masking and deidentification can be used to guard against harmful uses of linked data and to protect confidentiality. Linking across data sets has great potential payoff in terms of increased content coverage over a single source of data.
CONCLUSION 3-5: Linkages of data should be used whenever possible, with due regard to proper use and the protection of confidentiality in order to make the best use of existing data without the burden of new data collection.
DHHS DATA COLLECTION SYSTEMS
In its evaluation of gaps in the department’s data collection systems, the panel reviewed the 1999 DHHS report Improving the Collection and Use of Racial and Ethnic Data in Health and Human Services, a comprehensive study of the federal issues related to racial and ethnic data collection. The report calls for DHHS to develop an implementation plan that would prioritize the report’s recommendations, include a detailed plan of action, establish a responsible office(s) to carry out the plan, and assess costs for implementation. Thus far, such a plan has not been produced. The panel urges DHHS to develop such a plan, begin to implement the data improvement recommendations, and establish a responsible body for coordinating implementation across the various department agencies and ensuring that they follow through with recommendations.
RECOMMENDATION 4-1: DHHS should begin immediately to implement the recommendations contained in the 1999 report entitled Improving the Collection and Use of Racial and Ethnic Data in Health and Human Services.
There are many important recommendations in the 1999 DHHS report. However, the panel emphasizes a few that it sees as priorities for the department.
National household surveys are not large enough to support analysis of health outcomes for many racial and ethnic subgroups. In addition, the costs of obtaining extensive health data that are collected in surveys like the National Health Interview Survey (NHIS) or the National Health and Nutrition Examination Survey (NHANES) for small or geographically concentrated racial and ethnic groups make it impossible to collect such data on a regular basis for every group that may be of interest. However, periodic targeted studies on specific groups in specific areas could be conducted and could provide essential data on the health of these groups. The panel therefore recommends that DHHS develop a schedule for special targeted surveys of population subgroups, covering a 10- to 20-year period and each year identifying the group to be targeted. This would be a feasible way of collecting meaningful data on racial and ethnic subgroups over time.
RECOMMENDATION 4-2: DHHS should conduct the necessary methodological research and develop and implement a long-range plan for the national surveys to periodically conduct targeted surveys of racial and ethnic subgroups.
Beyond sample size, there may be other statistical issues to address when surveying certain racial and ethnic groups such as recent immigrants and farm workers (Kalsbeek, 2003). The rarity of these groups, their geo-
graphic dispersion, and in some cases their mobility often make it inefficient to sample them using standard household sampling methods. Furthermore, survey questions might be understood differently by different groups. These factors can distort measures of disparities. For these reasons, special methods are needed to measure disparities in such distinct populations.
RECOMMENDATION 4-3: The adequacy of sampling methods aimed at key racial and ethnic groups, as well as the quality of survey measurement obtained from them, should be carefully studied and shortcomings, where found, remedied for all major national DHHS surveys.
The DHHS Policy for Improving Race and Ethnicity Data (the “Inclusion Policy”) clearly states the goal of collecting data on race and ethnicity for all department programs, record collections, and surveys. The department’s household surveys all collect racial and ethnic data in accordance with the Office of Management and Budget (OMB) Standards for Maintaining, Collecting and Presenting Federal Data on Race and Ethnicity (the OMB standards). However, the department’s health data frequently come from record systems—either those used to administer a DHHS program (e.g., Medicare) or medical records from clinics, providers, and laboratories. The data on race and ethnicity collected through these records are incomplete, inconsistent, and unstandardized. Not all records collect data conforming to the OMB standards for race and ethnicity, and some do not contain such data at all. As a result, the department should enforce the Inclusion Policy and require those programs funded by DHHS that do not currently report data on race and ethnicity to collect such data and to do so in accordance with the OMB standards.
RECOMMENDATION 4-4: DHHS should require the inclusion of race and ethnicity in its data systems in accordance with its Policy for Improving Race and Ethnicity Data.
Data on SEP are needed both to better understand racial and ethnic disparities and to identify and understand health or health care disparities for deprived groups that are not defined by race or ethnicity but that nonetheless experience such disparities. DHHS data systems do not consistently collect data on SEP. While the national household surveys generally provide adequate SEP data, obtaining some measures of employment, education, and insurance coverage, although with limited detail, wealth data are rarely collected. Administrative and medical record systems include very little SEP data; in most cases, only insurance coverage status or method of payment is recorded. Employment status, occupation, and educational attainment are collected, as well. Because of the reporting burden, only limited data can be collected in these systems beyond what is essential.
Nonetheless, the department should consider ways to collect more SEP data in these record systems.
Limited knowledge about health practices and the U.S. health care system or limited communication skills in English are obstacles to obtaining care and understanding diagnoses and treatments. Little information on language use and acculturation (or proxies of it) is collected in national health surveys, although items on these topics could be added. Even less is collected in DHHS administrative records, surveillance systems, and national surveys, although more extensive collection could be justified to facilitate the provision of medical services and information.
RECOMMENDATION 4-5: DHHS should routinely collect measures of socioeconomic position and, where feasible, measures of acculturation and language use.
Weaknesses in a single source of data can often be remedied by linking data from several sources, without the burden of new data collection. For example, by matching SSA earnings records to Medicare claims data, we can study relationships between SEP and health care. Matching might be difficult or inaccurate if common identifiers are not of high quality or are missing. Confidentiality concerns also arise with data linkages because a common identifier is needed in both data sets to link the data, which may increase the possibility that an individual’s identity can be recognized. Programs that have linked data have proven that personal identity can successfully be protected with the proper precautions. Where possible, the department should promote relevant data linkages across DHHS agencies and across other agencies or institutions.
RECOMMENDATION 4-6: DHHS should develop a culture of sharing data both within the department and with other federal agencies, toward understanding and reducing disparities in health and health care.
Data on Medicare enrollees, contained in the Enrollment Database (EDB), are crucially important for understanding disparities in health and health care treatment as Medicare expenditures account for about 18 percent of U.S. health care spending (Centers for Medicare and Medicaid Services, 2003). Because of the importance of Medicare data, the panel believes it is crucial to improve Medicare data on race and ethnicity, through initiatives of the Centers for Medicare and Medicaid Services (CMS). For new enrollees, data on race and ethnicity, SEP, and a proxy of acculturation such as language use could be most easily collected at the time of enrollment. This information should also be collected for current enrollees. A very brief questionnaire could be used for both of these efforts. To keep the
questionnaire short, it is probably not feasible to collect income and wealth information. However, educational level could be included. A question about language use might also be considered.
To obtain more detailed SEP data for use in analysis, CMS should also obtain records of an enrollee’s earnings and employment histories through the wage history files of the SSA. These data show only individual earnings and only for the time period the person worked. They do not necessarily reflect the lifetime earnings of that individual, nor earnings and income available to that person through, for example, the earnings of a spouse. Privacy and confidentiality concerns should always be considered carefully. The panel believes that despite the potential barriers, the CMS and SSA should cooperate to link these two important data sets.
RECOMMENDATION 4-7: The Centers for Medicare and Medicaid Services should develop a program to collect racial, ethnic, and socioeconomic position data at the time of enrollment and for current enrollees in the Medicare program.
RECOMMENDATION 4-8: The Centers for Medicare and Medicaid Services should seek from the Social Security Administration a summary of wage data on individuals enrolled in Medicare.
Leadership for Implementing OMB Standards in DHHS Data Systems
The panel found considerable confusion among some groups of data collectors and users regarding the OMB standards for collection of data on race and ethnicity (National Research Council, 2003). To remedy this, DHHS should inform all its agencies, state health agencies, and private entities that collect data for DHHS programs about these new standards. The OMB has published materials to guide the use of the standards and on bridging to old categories of race and ethnicity. DHHS should increase awareness of the OMB standards by disseminating the appropriate OMB materials to the various state and private entities from which data are obtained and assume responsibility for ensuring that the standards are properly and consistently applied throughout the department’s data collection systems. DHHS should also develop implementation guidelines specifically aimed at the collection of racial and ethnic data in state and privately based record collection systems.
RECOMMENDATION 4-9: DHHS should prepare and disseminate implementation guidelines for the Office of Management and Budget standards for collecting racial and ethnic data.
Reporting Racial and Ethnic Health Disparities Data in Conjunction with SEP Data
The interrelationship between health and health care and SEP implies that it is important to consider racial and ethnic differences in health and health care within different social and economic backgrounds. Where possible, the panel urges DHHS to report statistics on disparities in health and health care by different levels of SEP.
RECOMMENDATION 4-10: DHHS should, in its reports on health and health care, tabulate data on race and ethnicity classified across different levels of socioeconomic position (SEP).
STATE DATA COLLECTION SYSTEMS
States and U.S. territories are responsible for maintaining numerous health-related data collection systems, including those for vital statistics information (birth and death records); hospital discharge abstracts, which detail information on hospital patients and the diagnoses and treatments they receive; registries, such as the cancer registry system, which provides information on cancer cases and their treatment; and programs such as Medicaid and the State Children’s Health Insurance Program (SCHIP). Some states also conduct their own surveys of their populations or have data collection systems for separate programs that provide health insurance and health care. The data in many state-based systems are shared with DHHS for department use in monitoring the health of the nation and administering and evaluating federal programs.
The collection of data on race and ethnicity in these state-based systems is uneven and unstandardized. While the Medicaid and vital records data collection systems follow the OMB minimum standards for racial and ethnic data collection, hospital discharge abstract systems do not; indeed, some do not collect such data at all. Since most of these systems are based on health records, very little information on socioeconomic position or language is collected. With the exception of information on parental education and country of origin for birth records, occupation on death records, and some income data in the Medicaid and SCHIP administrative records, no SEP or language data are routinely collected by states. This is a serious weakness in state-based data collections.
The panel encourages states to require standard racial and ethnic data collection in their health data collection systems, but in a manner that provides states the flexibility to serve their own specific information needs. The OMB standards would allow states the flexibility to collect more detailed information on race and ethnicity. These standards for reporting
broad categories of race and ethnicity could then be used by each state to report data to the national level in a uniform manner. The federal government depends heavily on state-based data and, therefore, should provide leadership to states to develop and utilize standards in state data collection systems.
RECOMMENDATION 5-1: States should require, at a minimum, the collection of data on race, ethnicity, socioeconomic position, and, where feasible, acculturation and language use.
There are, of course, barriers to imposing data collection requirements on states. The costs involved in changing reporting and computer systems are not insignificant. Furthermore, racial and ethnic data are often recorded by health care or program administration personnel who are not trained in interviewing (e.g., medical records clerks, providers and health care workers, or funeral directors). Similarly, many data collection systems have incomplete information because respondents may refuse to answer questions about race, ethnicity, acculturation and language use, or SEP or because recorders fail to request or ascertain the data. Many states could use technical assistance in handling missing data, e.g., through statistical imputation or by linking with other data. States also need guidance in implementing the new OMB standards for racial and ethnic data collection, including the bridging of new categories to old categories and the conversion of multiple-category responses (an individual reports he or she has multiracial ancestry) to single-category responses (a single racial ancestry).
Much work on these technical issues has already been conducted by federal statistical agencies. DHHS should use this work to develop guidance for states on how to address these training and methodological issues. DHHS should also provide states with guidance and support for training in recording data on race and ethnicity.
RECOMMENDATION 5-2: DHHS should provide guidance and technical assistance to states for the collection and use of data on race, ethnicity, socioeconomic position, and acculturation and language use.
PRIVATE-SECTOR DATA COLLECTION SYSTEMS
The panel’s review of current practices by private health care providers and insurance companies—hospitals, medical group practices, and health insurance plans—revealed that the collection of data on race, ethnicity, language and acculturation, and SEP in the private sector is not common and that, when such information is collected, it is unstandardized. Many hospitals collect racial and ethnic data on patients, and this reporting is fairly complete. However, the data are not reported to state and federal
programs in a standardized format and their accuracy for racial and ethnic groups other than white and black is suspect. Some health plans include questions about race, ethnicity, and language use on their enrollment forms, but this information is provided voluntarily by applicants and is often incomplete or missing. Even less is known about what data medical groups collect. Collection of SEP data is just as rare in these privately based data collections. Most often, the only SEP information collected by hospitals is the patient’s source of payment. Some health plans ask for level of education on their enrollment forms, but this is not common practice.
Data collected by these private-sector groups could be invaluable for monitoring and better understanding disparities in health and health care. Health insurance plans could use the data to inform quality improvement efforts, to target health promotion and preventive health measures to specific demographic subgroups, and to aid in disease management strategies. Hospitals could also use the data for quality improvement measures or community assessment initiatives.
The failure to collect these data represents an important missed opportunity. The panel believes that intervention from DHHS is needed to ensure that these data are collected. Health plans and hospitals that are interested in collecting these data are concerned that without a federal mandate, such collection will be perceived with suspicion by those who are asked to provide the information. As a result, the collection of these sensitive data items is intermittent and may be suspect in quality. Federal leadership is needed to help legitimize and standardize the collection of these data and could be effected through a DHHS requirement for the reporting of racial, ethnic, SEP, and language data.
RECOMMENDATION 6-1: DHHS should require health insurers, hospitals, and private medical groups to collect data on race, ethnicity, socioeconomic position, and acculturation and language.
DHHS should work with hospitals and health plans to determine the best way to collect data in a standardized way. The coordination of data collection could be complicated because these are private entities. Rulemaking under the Health Insurance Portability and Accountability Act (HIPAA) could contribute some uniformity. HIPAA does not currently require the collection of racial and ethnic data, and indeed its strong privacy measures may inhibit such collection. The act does, however, enforce standards on the collection of other data from health services. Thus, for example, a logical starting point for mandating the collection of data on race and ethnicity could be through HIPAA regulations that apply to electronic transactions in two DHHS programs—Medicare and Medicaid. Changes to the required data collection standards are possible through the Designated Standards Maintenance Organizations (DSMOs), the Data Content Com-
mittees (DCCs), and other stakeholder organizations. The secretary of DHHS may propose changes to the standards that the DSMOs, DCCs, and other organizations may then consider. The panel suggests further exploration of this avenue for the federal mandating of racial and ethnic data collection.
Whatever standards are chosen should use the OMB standards for their base, supplemented with further detail as needed. DHHS should also work with hospital and health plan-related groups to determine which SEP data are feasible to collect on enrollment or admissions forms. Collection of these data will necessarily be limited as the collection of detailed wealth and income information may impose a burden on providers and on the individuals providing the data. However, an individual’s education level may be the easiest and least sensitive item to collect.
In developing standards for data collection, it is critically important to provide clear information about how the data will be used so that individuals providing the data are fully informed. DHHS should work with industry agents and legal experts to develop a list of these uses for hospitals, health plans, and medical groups to give to individuals from whom the data are collected.
RECOMMENDATION 6-2: DHHS should provide leadership in developing standards for collecting data on race, ethnicity, socioeconomic position, and acculturation and language use by health insurers, hospitals, and private medical groups.
Implementation of this report’s recommendations would greatly enhance the data infrastructure available for understanding and eliminating disparities. However, if these recommendations cannot be implemented such that high-quality data are produced, linking aggregate-level data on race, ethnicity, SEP, and acculturation and language use may be needed to bridge the gaps. These data aggregated at the level of census geographical units (Zip Code tabulation areas, tracts, or block groups) could be used to proxy individual-level data by linking them to the individual level data that are available.
Suitable confidentiality protections are critical for the use of such linked geocoded data. The precise combination of values of the sociodemographic variables might identify a subject’s geographical area and thus pose a risk of disclosure of confidential information about health plan members. Methods have been developed for masking such data by rounding or adding random noise. Such masked data sets can then be analyzed with appropriate corrections for the effects of masking. But development of the specific procedures and parameters required to implement data masking requires particular statistical expertise that is not likely to be found within health insurers.
DHHS could greatly facilitate the routine generation of high-quality, uniform, and nondisclosing geographically linked data sets by providing a linking service that could be used by private- and public-sector health care organizations. Such a service could be administered, for example, through a Web site. The organization would anonymously submit a file containing member addresses, and receive in return a file of masked geographical variables at several levels.
The greatest expertise within the federal government for solving the problems involved in establishing such a service resides in the Bureau of the Census. Within DHHS, the National Center for Health Statistics (NCHS) has been a leader in dealing with confidentiality issues. Alternatively, a private-sector vendor with the necessary geocoding expertise could be recruited, although such vendors do not typically deal with the related confidentiality issues.
RECOMMENDATION 6-3: DHHS should establish a service that would geocode and link addresses of patients or health plan members to census data, with suitable protections of privacy, and make this service available to facilitate development of geographically linked analytic data sets.