DHHS Collection of Data on Race, Ethnicity, Socioeconomic Position, and Acculturation and Language Use
The Department of Health and Human Services (DHHS) is a major producer of data used in health and health care research. Through its survey data collection activities and the administration of its programs, the department collects an enormous amount of data that is used to study disparities. In this chapter, we provide an overview of the department’s data collection systems, by survey and by administrative sources, with emphasis on the racial, ethnic, socioeconomic position (SEP), and acculturation and language data collected as part of these systems. We identify gaps in the collection of data for measuring health and health care disparities, and conclude that more could be done to effectively capture, measure, and utilize a broader range of federal health data to understand disparities.
We briefly described in Chapter 2 the DHHS initiatives that promote the collection of racial and ethnic data in the department.1 The DHHS Inclusion Policy requires that information on race and ethnicity be collected in all DHHS-funded and -sponsored data collection systems (both surveys and administrative data systems) and that the latest (1997) OMB standards be used in the collection of these data. In 1999, DHHS released a report entitled Improving the Collection and Use of Racial and Ethnic Data in Health and Human Services (U.S. DHHS, 1999), which contains a number
of recommendations for the department’s data collection programs. This chapter draws upon that work.
HOUSEHOLD AND INDIVIDUAL SURVEY DATA COLLECTIONS
DHHS conducts a number of household surveys that collect information on health and health status, health care utilization, and health care treatment of individuals. The major household surveys and some of their basic characteristics are listed in Appendix A (pages 129-144). These surveys each have different purposes and unique features to address specific questions. The flagship household surveys conducted by DHHS are the National Health Interview Survey (NHIS), the National Health and Nutrition Examination Survey (NHANES), and the Medical Expenditure Panel Survey (MEPS). Each is designed to yield data that are representative of the U.S. civilian noninstitutionalized population. Each survey also collects a broad array of data about health and has special content and design features.
The NHIS is the largest of the surveys and the broadest in data content. It is a continuing survey conducted throughout the year to monitor the health of the U.S. civilian noninstitutionalized population. Approximately 43,000 households comprising almost 106,000 individuals are interviewed each year. The NHIS respondent sample now serves as the sampling frame for the MEPS and the National Survey of Family Growth (NSFG), enabling linkage of the data collected from these three surveys.
The NHANES collects extensive information on health and diet, including a dietary recall of foods consumed by respondents, and includes a medical examination for respondents. The survey is not as large as the NHIS and recently has been conducted less frequently.2
The MEPS focuses on health care use, expenditures, sources of payment, health insurance coverage, and health status. It collects data longitudinally from households,3 interviewing respondents multiple times over a 2-year period.
Other household surveys conducted or developed by DHHS include the Medicare Current Beneficiary Survey (MCBS),4 the Consumer Assessment
of Health Plans Survey (CAHPS),5 the National Immunization Survey (NIS), the National Survey of Family Growth (NSFG), the National Maternal and Infant Health Survey (NMIHS), the National Mortality Followback Survey, and the Youth Risk Behavior Surveillance System. In addition, the Behavioral Risk Factor Surveillance System (BRFSS) is a state-level survey developed by DHHS in collaboration with the states to monitor state-level prevalence of behavioral risks among adults (such as drug and alcohol use, presence of diseases such as diabetes, and level of exercise, among other things). The survey contains a core survey that is common across all states so that state comparisons can be made, and states can add their own questions to address the needs of their own populations.
Racial and Ethnic Data Collection
Each of the surveys described above is required to collect racial and ethnic data based on the revised OMB standards. Thus, each at a minimum collects data on whether the individual is white, black, Asian, American Indian or Alaska Native, or Native Hawaiian or Other Pacific Islander, allows respondents to check multiple categories of race, and includes a question on Hispanic ethnic origin. For the BRFSS, racial and ethnic data are collected as part of the core survey and the OMB standards are used. Other surveys collect additional categories of race and ethnicity. For example, the NHIS allows respondents to indicate whether they are Asian Indian, Chinese, Filipino, Japanese, Korean, or Vietnamese, and the National Immunization Survey (NIS) allows respondents to identify themselves as Mexican American, Chicano, Puerto Rican, Cuban, or other Spanish ethnicity.
Socioeconomic Position Data Collection
National household surveys provide some of the most extensive data on socioeconomic position of all the DHHS health-related data systems, with the NHIS, NHANES, and MEPS collecting the most data on SEP.6 All three of these surveys collect information on employment status, occupation, sources of income and amounts of income from each source, and education levels. Only the MEPS collects information on wealth, requesting
CAHPS is a survey tool kit developed by DHHS to survey consumers and purchasers of health plans. A CAHPS-based Medicare questionnaire was developed and has each year, since 1998, been sent to a sample of Medicare managed care enrollees.
See the paper by O’Campo and Burke in Appendix C, for a complete account of which SEP data are collected by DHHS surveys.
the estimated value of different types of assets and debts (e.g., home, business, stock funds, and savings accounts). Although the NHANES and NHIS do not ask about wealth, they both ask about home ownership. The other national household surveys collect limited SEP data—usually only education level or sometimes employment status. The BRFSS core questionnaire asks four questions on socioeconomic position—highest level of education achieved, employment status, health insurance coverage status, and a categorical variable for household income. Many of these surveys do collect information on a household’s participation in publicly funded programs for low-income persons. These include Medicaid, SCHIP, and WIC participation. These measures can be used as proxies for low income status.
Acculturation and Language Use Data Collection
Very little data on language use or acculturation are collected in these household surveys. The NHIS collects information on the respondent’s place of birth and citizenship. Both the MEPS and NIS collect information on the language in which the interview took place. CAHPS collects information on language spoken at home, and the National Mortality Followback Survey contains information on country of origin.
Data Gaps in National Household Surveys
While the national household surveys that are focused on health issues collect a wide range of data useful for measuring and understanding disparities in health and health care, there are limitations. One major drawback is that because these surveys are designed to be representative of the U.S. population as a whole, and although they generally have large sample sizes, the sample sizes are not usually sufficient to provide statistically reliable estimates of health and health care information for smaller ethnic and racial groups. Sample sizes for some broad racial and ethnic categories (e.g., blacks and Hispanics) are ample in most of these surveys. For example, since 1995, the NHIS has oversampled black and Hispanic populations; and since the MEPS and NSFG both use the NHIS sampling frame, each of these surveys has sufficient sample sizes for improved analysis of these groups. The sample size of the MEPS increased over 50 percent between 2000 and 2002. The 2002 version also oversampled Asians and low-income populations (U.S. DHHS, 2003a). The NHANES currently oversamples black and Mexican American populations, but sample sizes for smaller ethnic and racial groups are often not sufficient to support reliable estimates.
Another weakness of these surveys is that none has a sample size large enough to be representative of all, or even most, of the individual states.
Some policy analysis functions could be served by state-level data on health and health care. For example, it could be useful to compare health care access disparities in states with different policies for providing health insurance for low-income children. As the largest of the health-related household surveys, the NHIS is big enough for reliable estimates of health measures in some larger states and is designed to be readily implemented for use in state-level health surveys, which therefore could be comparable across states and with federal surveys. However, it is currently left to the states to develop and fund any such survey. The BRFSS uniformly collects data on risk behaviors and health practices for all 50 states, and so some state-related policy analyses can be conducted through this survey, but the BRFSS does not collect extensive data on health care, although it does ask about receipt of specific preventive services and about some access problems.
As stated above, the MEPS collects the most extensive data on SEP, including information on wealth; and because it is a longitudinal survey, it collects these data over a period of time, albeit a short period (2 years). Thus tracking changes in SEP and relating them to health outcomes in the short term would be possible with the MEPS, but only on a very limited basis. The other household surveys do not collect extensive data on wealth. There are thus two major limitations of these surveys with regard to SEP data collection: they collect little information on wealth and on measures of income over an individual’s life course.
Other federally sponsored surveys with more limited scopes and population coverage do collect extensive data on income, wealth, and socioeconomic position as well as measures of health status. The Health and Retirement Study (HRS), sponsored by the National Institute of Aging and conducted by the University of Michigan, is a panel survey of several birth cohorts all over the age of 50. The survey collects data on health status and income, wealth, and assets, among other items, for over 22,000 individuals each year. Data are from the same household every 2 years. The Assets and Health Dynamics Among the Oldest of the Old (AHEAD) survey, also sponsored by the National Institute of Aging and conducted by the University of Michigan, is a panel survey of individuals either at least 70 years old who responded to the HRS or at least 80 years old and drawn from a sample of Medicare beneficiaries. This survey collects some data on mental, physical, and cognitive health as well as economic, family, and program resources. Both of these surveys collect a rich set of health status and SEP data. Each also oversamples Hispanics and blacks. They do not collect data on health care utilization or treatment and do not cover younger populations. The HRS-AHEAD data have been matched with Medicare claims data to enable researchers to examine relationships between health care treatment and income, wealth, and other demographic background factors. However, sample sizes are too small for many subpopulation analyses, and
they may also be too small to study some specific health care problems or treatments. Also, as noted above, these surveys collect very little information on language, nativity, and acculturation.
Although all of the DHHS data sets are required to report race and ethnicity in a standardized way using the new OMB standards, trend analysis over the periods prior to and since implementation of the new standards could create a problem for racial classification. For example, under the old standards individuals of mixed racial backgrounds were asked to choose a single racial background, whereas now under the new standards they can choose multiple racial backgrounds. “Bridging” attempts to statistically model how individuals would respond to the new racial categories based on their responses to the old categories. The OMB has provided guidance on tabulation methods to bridge race responses between the old and new standards (OMB, 2000b). Background analysis for this guidance was conducted using the NHIS.7 Individuals’ responses to race questions from the old and new standards were compared with statistically predicted responses under the old categories based on the individual responses to the new categories. The analysts found that the smallest numerical racial categories were most sensitive to different bridging methodologies.
The DHHS sponsors a number of surveys that collect data from hospitals, physicians’ offices, and clinics. Some of these surveys collect information directly from the individuals who use these services, but all of them also collect data from the records prepared in conjunction with the service provided. Some major examples of this type of survey are given in Appendix A (pages 144-149). The surveys collect extensive data on the health care utilization and treatment of individuals as well as on the agencies, hospitals, and clinics that provide the care. The National Ambulatory Medical Care Survey (NAMCS), the National Hospital Discharge Survey (NHDS), and the Healthcare Cost and Utilization Project (HCUP) have large sample sizes.
Racial and Ethnic Data Collection
Data on race and ethnicity in these provider-based surveys usually come from records rather than from direct interviews of individuals and are usually recorded by medical personnel or intake workers. Sometimes the
information is based on the observation of the person filling out the record, and sometimes the patient is asked about his or her race and ethnicity. The quality and consistency of the reporting is therefore open to question.
Often, information on race and ethnicity in these surveys is simply missing. For the NHDS, 20 percent of the records do not include data on race and 75 percent do not include data on ethnicity. Similarly, 20 percent of the National Home and Hospice Care Survey records do not include race and 30-40 percent do not include ethnicity. As part of a study to redesign the Drug Abuse Warning Network (DAWN) data collection system in 2002, the Substance Abuse and Mental Health Services Administration (SAMHSA) sent trained researchers into six hospitals’ emergency departments to examine their records and abstracts of patients in order to assess which data elements were captured in the records (SAMHSA, 2002). The study found that race and ethnicity were sometimes listed in clinical notes but that the data were not consistently collected. Forty percent of the records lacked information on race and 87 percent provided no information on ethnicity. The study also found that it was unclear whether racial and ethnic categories were consistently used.
Socioeconomic Position Data Collection
Most of these surveys collect only limited data on SEP. In general, only the information needed to ensure payment for the service provided, whether to the individual or to the appropriate government program, is available. In each case, the source of the payment is the only SEP data collected.
Acculturation and Language Use Data Collection
These facilities-based surveys do not collect any data on acculturation and language. Only the National Survey of Substance Abuse Treatment Services collects information on the languages offered for treatment services at the facility.
Medicare program data have been widely used to study health and health care treatment outcomes, including in studies to measure and understand racial and ethnic disparities in health and health care (Escalante et al., 2002; Escarce et al., 1993; Gornick et al., 1996; Schneider, Zaslavsky, and Epstein, 2002; Skinner et al., 2003). Because of Medicare’s entitlement program status and because basic benefits are available to everyone of a certain age regardless of their economic and social background, data from the program have been valuable in better understanding racial and ethnic
disparities that exist beyond at least a basic level of health insurance coverage. This section describes the Medicare data system—specifically those data used to measure and understand racial and ethnic disparities in health and health care. The section focuses on the Medicare Enrollment Database (EDB), which is the primary source of racial and ethnic data for linkage with other Medicare records. Previously mentioned surveys such as the MCBS and CAHPS-Medicare Satisfaction Survey also collect data on Medicare enrollees—the MCBS on a sample of about 12,000 enrollees each year and CAHPS on a sample of Medicare managed care enrollees. These two surveys ask questions about race and ethnicity, SEP, and language and acculturation, and thus do not rely on administrative records.
The Medicare EDB contains information on all Medicare beneficiaries. Although it is based entirely on administrative records and does not contain much detailed information on beneficiaries, it is an important database because it can be linked to other Medicare files that include information on health status, service expenditures and financing, age, and gender. Data on race and ethnicity (and on other information about beneficiaries) are obtained from the Social Security Administration (SSA). The SSA provides to the Centers for Medicare and Medicaid Services (CMS), which administer the Medicare program, data on people eligible for Medicare. This information, which is used by CMS to determine who becomes eligible for Medicare, is then used in the EDB once an eligible individual enrolls in Medicaid. Data on race and ethnicity are thus obtained and included in the EDB. However, as we will explain below, racial and ethnic data are not available for all Medicare enrollees and the categories of these data are limited and have changed over the course of the program’s history.
Since the beginning of the Social Security program in 1936, racial data were collected on a voluntary basis when a person applied for a social security number (SSN) on the SS-5 form (see Scott, 1999, for a detailed account of racial and ethnic data collection for the Social Security program). The SS-5 form has been the primary source of racial and ethnic data for original and new social security applicants. In 1989, SSA began its “enumeration at birth” program, which assigns an SSN to infants at birth. This system is based on the vital statistics birth registration system. However, information on race and ethnicity from birth certificates is not transmitted to the SSA because it is listed on the birth certificate as “Information for Medical and Health Use Only,” meaning it is considered unnecessary for the administration of Social Security programs (Scott, 1999). Thus, for registrants since 1989, no racial and ethnic data are available from SSA unless an individual has applied for a new SSN or a name change, at which time the information was collected.
Over the time for which racial and ethnic data have been collected, the categories of race and ethnicity in the SSA data have changed. Until 1980,
the categories collected were white, black, and other; unknown was used to classify those who did not report any race. Since 1980, the categories are white non-Hispanic; black non-Hispanic; Hispanic; North American Indian or Alaska Native; and Asian, Asian American, or Pacific Islander. These data were scheduled to be in compliance with the new OMB standards by the end of 2003. However, data in the 1980 expanded categories were obtained only from people who filled out the SS-5 form (in order to get a new SSN or to request a name change). Data in the new OMB standards will also be collected only when people fill out the SS-5 form. Thus, for people born before 1980 who did not apply for a new SSN, the racial and ethnic categories in the Medicare EDB are still white, black, other, or unknown (Lauderdale and Goldberg, 1996).
In 1994, racial and ethnic data from the new or changed SS-5 records were integrated into the EDB records to correct and fill in missing information.8 This effort resulted in changes in coding for more than 2.5 million enrollees, about 30 percent of whom were reclassified according to the new racial and ethnic categories implemented after 1980 (Lauderdale and Goldberg, 1996). This update was repeated in 1997 and again in 2000 and 2001 for beneficiaries added since the previous update. CMS’s target is to conduct this update for new beneficiaries annually.
CMS also attempted to fill in missing data on race and ethnicity in 1997, using a postcard survey of people with Hispanic surnames, with a Hispanic country of birth (as defined by SSA), and with “other” or “missing” race codes. Over two million people were surveyed but a response rate of only 43 percent was achieved. This effort was nonetheless successful in filling in missing information and resulted in a reclassification of other data for a total of 850,000 people (Arday et al., 2000). CMS has also used beneficiary-level information on race from 32 states for Medicaid enrollees and collected racial information from the End-Stage Renal Disease Medical Evidence Report.
The EDB data on race and ethnicity have been matched and compared with the MCBS data on race and ethnicity for MCBS survey respondents, which are obtained in face-to-face interviews (Arday et al., 2000). Responses to questions about race and ethnicity from the MCBS rounds 1 (1991), 16 (1996), and 19 (1997) were compared with EDB race and ethnicity data.9 Arday and colleagues found high levels of misclassification
of racial and ethnic data in the EDB for groups other than blacks and whites. The sensitivities (or the probability that the EDB correctly classified persons of the given race or ethnicity) were high for the white and black classifications but low for Hispanics, Asian/Pacific Islanders, American Indians, and individuals of other races. The specificities (or the probability that the EDB did not identify someone not of the given race or ethnicity) were high for all nonwhite groups (over 99 percent) but somewhat low for the white classification (87 percent), meaning that 87 percent of those who were nonwhite were not identified as white in the EDB. The sensitivity and specificity of classifications for the other groups improved substantially after the 1994 EDB update; the sensitivity for Hispanics doubled from 19 percent to 39 percent, almost tripled for Asian/Pacific Islanders from 20 percent to 58 percent, and dramatically increased for American Indians from less than 1 percent to 11 percent. Thus, the CMS efforts to fill in missing racial and ethnic data for enrollees had some success in improving the data files. However, misclassification was still high even after the update for these groups, and the authors cautioned against the use of racial and ethnic categories other than black and white in measuring disparities (Arday et al., 2000).
A further limitation in the racial and ethnic data contained in Medicare beneficiary files is that when CMS obtains the enrollee information from the SSA master beneficiary record, it receives information only on the retiree, not the retiree’s spouse. Instead, the race of the beneficiary is simply assigned to the spouse.
The EDB does not include any SEP information. The MCBS does collect data on education level and total household income. One possible linkage that would provide a measure of SEP—specifically, current and lifetime earnings income—would be to merge SSA earnings data with the EDB, although SSA earnings records are not perfect measures of lifetime income or of the more general concept of SEP (see Dynan, Skinner, and Zeldes, 2004, for a discussion of the use of SSA earnings as a measure of lifetime earnings). These records are available only for those who have worked in jobs covered by social security, and they do not include undocumented earnings, which could make a sizable difference in earnings measures for immigrant groups (see, for example, Gustman and Steinmeier, 2000). Furthermore, SSA records only cover periods when a person worked and therefore may not be good measures of lifetime income for individuals who did not work their entire lifetime in the United States. The distinction between income and wealth is also important here. Many people in the SSA file who do not have high earnings may have significant sources of wealth. For example, spouses who worked very little may show low earnings, although they may have access to significant resources through their working spouses. Divorced spouses who worked little outside the home while
married but who worked later may have had access to higher levels of income during their marriage than what their SSA records imply. Finally, Social Security has a maximum on earnings for which contributions to the system are made. For individuals who meet this maximum, earnings data report only the maximum, not the actual amount of earnings.
Despite such weaknesses in the SSA earnings data, they are a potentially very useful source of SEP data for supplementing Medicare enrollee data. These data are already collected and thus could supplement Medicare enrollee data cheaply relative to the costs of new data collection. The breadth of these records in covering the span of the U.S. working population over the life course (with the exceptions noted above) is unique among available sources of information relevant to studying disparities.
No language data are contained in the EDB. The SS-5 form does collect data on country of origin, but currently those data are not obtained from SSA. The MCBS does not collect information on language or acculturation.
DISEASE SURVEILLANCE SYSTEM DATA COLLECTION
The DHHS has a wide array of data collection systems designed to monitor disease outbreaks, disease treatment outcomes, injuries, food safety problems, and other public health problems. For example, the Haemophilus Influenzae Surveillance System compiles information on all Haemophilus influenzae cases reported to the CDC; the Adult Spectrum of Disease data collection system enumerates and characterizes persons with HIV at various stages of immunologic function; the Firearm Injury Surveillance Study collects information on nonfatal firearm injuries; and the Childhood Blood Lead Surveillance collects information from laboratories on children under the age of 6 who have been tested for blood lead levels. The National Cancer Institute runs the Surveillance, Epidemiology, and End Results (SEER) program to provide data on cancer incidence and survival in the United States. Data are collected from cancer registries in 14 geographical areas covering approximately 26 percent of the U.S. population.10Appendix A (pages 149-173) lists these data collection systems, their purposes, and information about the data they collect on race, ethnicity, SEP, and language.
Most of these data collections come from medical records of patient treatments or from laboratories that test for specific diseases. In most cases, an “event” occurs when a person with a disease seeks medical attention and a record of that visit is created. The surveillance data collection systems draw on the medical records to collect information recorded from that
We will discuss these state and local cancer registries in more detail in Chapter 5.
initial event and, as applicable, from subsequent visits. Some surveillance data are supplemented with other survey samples of those with the disease, using data gathered at a state or local public health agency that are sent to the federal government. For some systems, only a limited number of states or localities participate in the system so that national coverage of a disease or public health problem is not always possible. Finally, since the data predominantly originate from medical records, they represent people who have a disease or injury and who seek treatment of some sort, not broad demographic populations as are captured in the national health surveys.
Racial and Ethnic Data Collection
The collection of racial and ethnic data in disease surveillance systems is inconsistent. Although racial and ethnic data are collected in many systems, they are often of suspect quality and may not adhere to the OMB standards for such data collection. In many medical record systems, the patient’s race is recorded by a health care worker. For example, the HIV/ AIDS Reporting System collects racial and ethnic data from a standard CDC form filled out by the provider. The individual with HIV may not be asked his or her race or ethnicity; rather, it may be inferred by the medical staffer filling out the form.
SEER does not use separate questions about race and Hispanic ethnicity; instead, its categories are white non-Hispanic, white Hispanic, black, Chinese, Japanese, Filipino, American Indian/Eskimo/Aleutian, Hawaiian, other, or unknown. Some disease surveillance systems do not collect any racial and ethnic data—for example, the Sexually Transmitted Disease Surveillance System.
Socioeconomic Position and Acculturation and Language Data
SEP data are even more rarely collected in these systems. Education level is collected as part of the Hemophilia Surveillance System. Occupation is collected as part of the Surveillance for Tuberculosis Infection in Health Care Workers system, but this data system is limited to those who work in health care settings. Otherwise, SEP data are not included as part of these data systems and none of the systems collects information on acculturation or language.
HUMAN SERVICES PROGRAMS
DHHS administers several large programs aimed at providing support for poor families and abused or neglected children, child care, early child-
hood education, and community social services. These programs include Temporary Assistance for Needy Families (TANF), which provides cash assistance and other services to poor mothers with children, the Head Start program, the Child Care and Development Block Grant program, the Social Services Block Grant program, and programs providing grants to agencies that serve abused or neglected children and people abused by family members. Appendix A (pages 173-178) gives background on these programs.
For TANF and Head Start, individuals must apply and meet eligibility requirements to participate in the programs. Information about their race and ethnicity (and in the case of Head Start, their parents’ race and ethnicity) is collected as part of the application process. In the TANF reporting system, states must provide data on a quarterly basis to the federal government about the race and ethnicity of persons served. Data on employment, earnings, and income from other sources are also collected. Some states have asset tests and vehicle value asset tests for eligibility, and so these data are also sometimes available. The Social Services Block Grant does not collect racial and ethnic data of persons served through the program. The Child Abuse and Neglect Data system reports data on race and ethnicity.
INDIAN HEALTH SERVICE DATA
The Indian Health Service (IHS) is responsible for providing health care services to American Indians and Alaska Natives. This DHHS agency provides health services—either at IHS facilities or by contract with private-sector providers, tribally operated programs, and urban Indian health programs—to individuals who are members of or can prove descendence from a member of a federally recognized tribe.
As part of its mission and record-keeping functions, IHS obtains data on the utilization of these services. The IHS Patient Registration System collects demographic data on persons that access the IHS system and these data are linked to the IHS patient care information systems. The IHS Ambulatory Patient Care System collects diagnostic data on individuals who receive ambulatory medical care that is either provided or funded by IHS. The IHS Dental Services Reporting System and Inpatient Care System serve the same function for these health service providers. IHS also maintains birth and death records for American Indian and Alaska Native individuals. These records are forwarded to NCHS from the states and then forwarded to IHS.
The data in these systems are used both to monitor health status (e.g., infant mortality, life expectancy) and to understand health care utilization and treatment of American Indian and Alaska Native populations. IHS produces a series of reports called Trends in Indian Health and Regional Differences in Indian Health, which uses these data. None of these data systems include information on SEP or language.
There are many excellent sources of data collected by DHHS that can be utilized to better understand disparities in health and health care. But, as noted throughout this chapter, limitations in these data sources exist. Through its Inclusion Policy and its 1999 report Improving the Collection and Use of Racial and Ethnic Data in Health and Human Services, which included recommendations to improve its racial and ethnic data collection, the department has begun to address some of the data weaknesses highlighted in this chapter. In this section, the panel gives its recommendations for improvements to national data collection efforts.
The 1999 DHHS report is a very comprehensive presentation of the federal issues related to racial and ethnic data. The reports’ recommendations are important for improving federally based data sources and should be acted on. The report calls for the authoring groups to develop a plan to implement the recommendations that would prioritize recommendations, create a detailed plan of action, establish a responsible office(s) to carry out the plan, and consider costs needed for implementation. Thus far, no implementation plan has been produced, although some DHHS agencies have implemented some of the recommendations. The panel believes that DHHS should develop such a plan and continue to implement the data improvement recommendations. The plan should include the establishment of a body that would be responsible for coordinating implementation across the various agencies of the department and for ensuring that agencies follow through with recommendations.
RECOMMENDATION 4-1: DHHS should begin immediately to implement the recommendations contained in its 1999 report entitled Improving the Collection and Use of Racial and Ethnic Data in Health and Human Services.
There are many important recommendations in the 1999 DHHS report. The panel wishes to emphasize a few that it sees as priorities for the department and vital to the improvement of federal data collection systems. The panel’s primary focus is on the improvement of existing data collection efforts to make them more effective and the creation of new collections to fill data gaps.
The panel believes that four themes in the 1999 DHHS report are especially noteworthy:
developing feasible approaches for including racial and ethnic groups in national surveys;
improving the collection and analysis of SEP, language, and acculturation data;
ensuring the collection of racial and ethnic data in DHHS and DHHS-sponsored administrative record systems; and
developing mechanisms for linking records across government data systems.
National household surveys are not large enough to support analysis of health outcomes for many racial and ethnic subgroups. The costs of obtaining extensive health data—such as the data collected in surveys like the NHIS or the NHANES—for small or geographically concentrated racial and ethnic groups make it impossible to collect such data on a regular basis for every racial and ethnic group. However, periodic studies targeted to survey specific groups in specific areas could be conducted and could provide vital data on the health outcomes of these groups. The panel therefore recommends that DHHS develop a schedule for special surveys of population subgroups—e.g., American Indians in Washington state, Cuban Americans in Florida. These are just two examples of groups that could be surveyed. In developing the plan, DHHS would need to identify specific information needs in consultation with the various DHHS agencies and representatives of subgroups. Such targeted surveys may also consider adding appropriate, more extensive measures of acculturation. This schedule, covering a 10- to 20-year period and each year identifying the group to be targeted, would be a feasible way of collecting meaningful data on racial and ethnic subgroups over time.
RECOMMENDATION 4-2: DHHS should conduct the necessary methodological research, and develop and implement a long-range plan, for the national surveys to periodically conduct targeted surveys of racial and ethnic subgroups.
The panel notes that the DHHS Assistant Secretary for Planning and Evaluation (ASPE) has sponsored work to assess federal health data sets for their ability to provide data on detailed Asian and Hispanic subgroups and on American Indian and Alaska Natives (Waksberg, Levine, and Marker, 2000).
Beyond sample size, there may be other statistical issues to address when targeting certain racial and ethnic groups (Kalsbeek, 2003). The rarity of these groups, combined with their geographic dispersion, often makes it inefficient to sample them with greater intensity in household samples involving the usual practice of selecting samples of residential telephone numbers or geopolitical area units, such as counties and census block groups. Moreover, in addition to reducing the likelihood of sample coverage, the relative mobility of some types of racial and ethnic groups (e.g., recent immigrants, farm workers) often leads to skewed samples favoring those who are more mobile, unless specific steps are taken in sampling and estimation to correct these problems.
The issue of comparability in measurements is also important. For example, recent Spanish-speaking immigrants may understand and thus respond to survey questions differently from U.S.-born Hispanic respondents who speak English. Special efforts may therefore be needed to develop survey questions that can be uniformly understood. The potential for loss in comparability in the process of information exchange in general population surveys can mask or exaggerate real differences in studies to assess disparity. For these reasons, DHHS should take steps beyond those currently being taken in its surveys to enhance the department’s ability to determine where disparity exists and evaluate attempts to eliminate it.
RECOMMENDATION 4-3: The adequacy of sampling methods aimed at key racial and ethnic groups, as well as the quality of survey measurement obtained from them, should be carefully studied and short-comings, where found, remedied for all major national DHHS surveys.
The DHHS Inclusion Policy clearly states the goal of collecting racial and ethnic data for all department programs and record collections and surveys. The department’s household surveys all collect racial and ethnic data in accordance with OMB standards. However, the department’s health data frequently come from administrative records either from DHHS programs (e.g., Medicare) or from clinics, providers, and laboratories. Not all of these records use the OMB standard categories for race and ethnicity, and some do not collect such data at all; as a result, the racial and ethnic data collected through these records are inconsistent and unstandardized.
These administrative data sets contain a large number of records and could, with better data on race, ethnicity, SEP, and language use, offer a valuable source of information for research on disparities. In order to improve these data sets, the department should enforce the Inclusion Policy and require those programs that do not report racial and ethnic data to collect such data in accordance with the OMB standards. This is especially important with respect to the Health Insurance Portability and Accountability Act (HIPAA) standards.11
RECOMMENDATION 4-4: DHHS should require the inclusion of race and ethnicity in its data systems in accordance with its Policy for Improving Race and Ethnicity Data.
HIPAA establishes a national standard for electronic transactions with which all health plans, health care clearinghouses, and providers conducting business electronically must comply. The law requires that DHHS adopt transaction and code set standards for covered transactions, including claims and enrollment transactions. We will discuss HIPAA and its relevance to the reporting of racial and ethnic data in more detail in Chapter 6.
Although DHHS data systems do not consistently collect data on SEP, such data are needed both to better understand racial and ethnic disparities and to identify effects on deprived groups that are not defined by race or ethnicity but that experience health or health care disparities.
The national household surveys generally provide sufficient SEP data, obtaining some measures of employment, education, insurance coverage, income, and wealth. There are weaknesses, however. Income and wealth data are often not collected in much detail. Income questions in some of these surveys are often categorical and do not obtain information on exact income levels or on how much income is received from different sources. The collection of wealth data is rarer even though wealth is a very important measure of a lifetime accumulation of resources.
The department’s administrative and record-based data collections include very little SEP data. In most cases, only information on insurance or method of payment is recorded. Sometimes employment and educational attainment status are collected as well. With the understanding that any data collected in these systems must be relevant to the administration of the program or service, the department should consider ways to collect more SEP data, both in surveys and in administrative data collection.
Knowledge about health and the health care system and the ability to communicate with health care providers are crucial components of an individual’s ability to negotiate the health care system, understand diagnoses and recommended treatments, and pay for treatment. Those who are not proficient in the English language or who do not know the system well may have more difficulty getting the care they need. Data on language proficiency and acculturation could be used both to explain differences in health outcomes across and within ethnicities and to improve health care services and programs so that they better accommodate these populations.
Very little information on language use and acculturation is collected in national health surveys and even less is collected in DHHS administrative and surveillance system records. Surveys are best suited to collect more extensive information on language ability and on the degree of acculturation, whereas records-based collections are more limited in the extent of data that can be obtained. However, in many instances, information on primary language could be useful both for the provision of medical services and information and as a data element for later use in research.
RECOMMENDATION 4-5: DHHS should routinely collect measures of SEP and, where feasible, measures of acculturation and language use.
Weaknesses in a single source of data can often be remedied by linking data from other sources, a practice that can make use of existing data without the burden of new data collection. But there are sometimes barriers to linking across agencies and even within agencies. For example, the ability
to match data may be limited if common identifiers are unavailable or of poor quality. Confidentiality concerns also arise with data linkages because a common identifier is needed in both data sets to link the data and this may increase the possibility that an individual’s identity can be recognized. Special attention must therefore be devoted to the protection of respondent confidentiality and proper use of the data with linked data sets.
Although there are barriers and costs to sharing data, the resulting richer sets of data can be used to fill in important gaps in any single data source. For example, matching SSA earnings records to Medicare claims data provides a means to understand links between race, ethnicity, and SEP and health care treatment and treatment outcomes. Therefore, where possible, the department should encourage and promote data linkages, including between data sets collected and maintained both in different DHHS agencies and with non-DHHS departments or institutes.
RECOMMENDATION 4-6: DHHS should develop a culture of sharing data both within the department and with other federal agencies, toward understanding and reducing disparities in health and health care.
Each of these first six recommendations (and the last two in this chapter) is directed to the department in general, and not to a specific agency within DHHS. The panel directs these recommendations to the Office of the Secretary because the panel believes the actions of these recommendations need to be taken on a department-wide basis and because there is no other agency within DHHS to which these recommendations can be directed. As was mentioned above, the 1999 DHHS report calls on DHHS to establish a responsible body for coordinating the implementation of that report. Such a body would be a logical place to direct the panel’s recommendations, but it does not yet exist.
Data on Medicare enrollees, which cover all elderly persons who enroll in Medicare and are collected through the EDB, are crucially important for understanding disparities in health and health care treatment. Since much of the health care a person receives occurs later in life, the database covers individuals when they are likely to be using the health care system most frequently.
As this chapter discussed, the reporting of racial and ethnic data in Medicare is incomplete. Many individuals enrolled in Medicare do not have a reported race or ethnicity in their records, and, under current procedures, many who will eventually qualify for Medicare in the future will not have racial and ethnic data in their records. Because of the importance of Medicare data in measuring health and, especially, health care disparities, the panel believes it is crucial for CMS to take the initiative in collecting racial and ethnic data for both current and future enrollees. For new enrollees, the best time to collect data on race and ethnicity, SEP, and language would
appear to be at the time of enrollment. A very brief questionnaire could be used for both current and new enrollees. To keep the survey short, complete information about income and wealth need not be collected, although a categorical question on income or educational level could be included. A question about language use might also be considered.
The collection of additional data for current enrollees will not be an easy or inexpensive task. A previous CMS attempt to collect racial and ethnic data through a postcard survey of current enrollees achieved some success in filling in vital missing information (Arday et al., 2000), but it had a poor response rate. A thorough effort is needed with full support for proper follow-up.
RECOMMENDATION 4-7: The Centers for Medicare and Medicaid Services should develop a program to collect racial, ethnic, and socioeconomic position data at the time of enrollment and for current enrollees in the Medicare program.
As mentioned, the Medicare Enrollment Database does not collect SEP information. It is possible to obtain records of an enrollee’s earnings and employment histories through the wage history files of the SSA, but these data are not without problems. For example, some individuals may not have worked long and may thus show relatively lower earnings in the system. Or, some individuals may have had a spouse who earned wages and thus have a greater income or wealth than the SSA records imply. In addition, those who have worked in the U.S. labor market for only a few years may also show low earnings even though they may have accumulated wealth from income earned outside the SSA system (Gustman and Steinmeier, 2000; Dynan, Skinner, and Zeldes, 2004). These data should therefore be combined with information on the number of quarters the individual was in the system (since reporting is quarterly) and the number of years the individual earned the maximum contribution level. Privacy and confidentiality concerns should also be considered carefully. However, despite these potential barriers, the panel believes that the CMS and SSA should cooperate to link these two important data sets.
RECOMMENDATION 4-8: The Centers for Medicare and Medicaid Services should seek from the Social Security Administration (SSA) a summary of wage data on individuals enrolled in Medicare.
The panel recognizes that there are barriers to obtaining these data; for example, privacy and confidentiality concerns have hindered CMS efforts to obtain such data in the past. As an alternative to obtaining earnings records, CMS is currently seeking to use information on the amount of social security benefits paid to individuals (which are based on earnings) as
a proxy for earnings. This information can be obtained from the SSA Master Beneficiary Record, from which CMS has previously obtained data.
Leadership for Implementing OMB Standards for Health Data Collection
There is still a lack of understanding and delay in implementation of the new OMB standards for collecting racial and ethnic data, particularly outside the federal statistical community. Many attendees of the panel’s Workshop on Improving Race and Ethnicity Data Collection expressed confusion over what the minimum race categories were, whether more detailed categories could be used, how data on Hispanic ethnicity should be collected, how multiple-race responses should be handled, and how data collected before the new standards can be bridged to data collected since the standards were implemented (National Research Council, 2003).
The OMB has published materials to guide researchers in using the new standards and bridging to the old categories.12 But while federal researchers may be well aware of these standards, researchers at nonfederal levels may not be. DHHS should take a leadership role in educating relevant staff at all DHHS agencies, state health agencies, and private entities that collect data for DHHS programs about these new standards. The department should increase awareness of the OMB standards by disseminating the appropriate OMB materials to the various state and private entities from which DHHS obtains data. For example, CMS could distribute such materials to all state Medicaid directors. In addition, DHHS should assume responsibility for ensuring that the new standards are properly and consistently applied throughout the department’s data collection systems. Because it is often necessary to collect data for more specific racial and ethnic groups than are listed in the OMB standards, the panel also recommends that DHHS promote uniformity in such data collections by publishing suggested subclassifications for each of the OMB classifications for use in all DHHS data collection efforts. Such steps would be mutually beneficial both for states and private entities, who are looking for guidance on the OMB standards, and for DHHS, which relies on states and private entities to provide much of its data on health and especially health care-related topics. As the following chapter will discuss, the data collection efforts of many states are conducted for federal programs or in cooperation with federal agencies to obtain national-level data. DHHS can improve the state-based collections of information on race and ethnicity by being consistent across the department in requiring OMB standards in its data collection programs. Inasmuch
as some of these cooperative data collection efforts require input and cooperation from each of the states, implementing the standards in states may be a delicate balance of allowing states to meet their own needs for data collection while promoting national-level comparability.
RECOMMENDATION 4-9: DHHS should prepare and disseminate implementation guidelines for the Office of Management and Budget (OMB) standards for collecting racial and ethnic data.
The panel notes that some agencies within DHHS have issued such guidelines. For example, the National Institutes of Health have issued guidelines for maintaining, collecting, and reporting racial and ethnic data in clinical research.
The Importance of SEP in Understanding Racial and Ethnic Health Disparities
In Chapters 2 and 3, we illustrated the interrelationships between race, ethnicity, and SEP. Because of the interrelationship of these variables, in order to accurately interpret racial and ethnic differences in health and health care it is important to consider differences within groups of different social and economic backgrounds. Therefore, where possible, the panel urges DHHS to report health and health care disparities across different levels of SEP. The specific SEP measures used may depend on the outcome of interest (for example, education level may be the most appropriate measure for examining preventive health knowledge and outcomes—such as the percent of women receiving a mammogram each year) or upon the availability of data. In any case, the department should make an effort to consider SEP differences in conjunction with racial and ethnic differences in its health disparities reports.
RECOMMENDATION 4-10: DHHS should, in its reports on health and health care, tabulate data on race and ethnicity classified across different levels of socioeconomic position.