Skip to main content

Currently Skimming:

5 Data Linkage and Innovation
Pages 71-92

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 71...
... One of the advantages of CMS data is its strong enrollment informa tion, Virnig noted; information on everyone enrolled in each program is provided on a monthly basis, which provides a denominator for this population as well as some demographic information, and dates of birth and death. CMS has health claims data that include records -- depending 71
From page 72...
... She said it is also pos sible, but problematic, to match using name plus date of birth. Virnig shared examples of alignment of Medicare data with data from several studies to illustrate how Medicare complemented or supported the data collected in a survey.
From page 73...
... She posited that people who cycle in and out of Medicaid may be of interest, and perhaps many of them are survey nonrespondents. In addition to left truncation and interval censoring limiting the usefulness of linking to Medicare and Medicaid data, Virnig added that another area that likely needs more investigation is the impact of linking on survey sampling weights.
From page 74...
... In describing the data linkage program at NCHS, Mirel provided four examples of pressing policy questions that require complex and detailed data that NCHS has addressed by combining survey and administrative data:
From page 75...
... Mirel described some of the key sources of linked administrative data. The addresses of survey participants are geocoded to add in contextual information obtained from standard Census geocoded areas.
From page 76...
... Documentation is included on how these weights could be adjusted given the linkage, and how to interpret the findings based on the sample weights being representative at the time when the survey was conducted. NCHS releases curated data files that can be used for many different research questions and more than 1,000 publications have been based on NCHS linked data, Mirel said.
From page 77...
... Because linked data put participants at a greater risk of re-identification, NCHS has been working on creating more publicly available linked data sources using synthetic data and then setting up a validation server so that researchers could validate their results from the synthetic data against the true data. She said that they are hoping to con duct meetings with researchers to identify key variables and create analyti cally useful datasets.
From page 78...
... Mirel replied that agreements for data sharing are one of the biggest obstacles; she suggested using PPRL could open the door to linking to potentially many more sources without needing the direct exchange of PII. Virnig suggested developing ways to help build teams to work on complex projects involving surveys and administrative data linkages.
From page 79...
... She pointed to a research paper that looked at linkage eligibility bias for the people in the survey for whom they had all records, assessed what the bias was, and posited how to mitigate it with some adjustment to the sample weights. Madans shared the difficult experience in negotiating for each administrative data source, noting each agency poses different and sometimes contradictory requirements.
From page 80...
... If these hypotheses are true, Jäckle continued, then it is important to think about how to design the study for these additional tasks, including the survey, the consent question, and the incentives. The goal would be to design surveys to increase acceptability of additional requests that conform with what respondents think they signed up for so they agree to do the additional tasks.
From page 81...
... She shared she has been reflecting on how the response to additional requests might be influenced by the design of the study in which the additional requests are made (Figure 5-2)
From page 82...
... attention to the survey questionnaire, then the additional requests made, such as for biosamples, additional measurements, and additional questionnaires. The respondent then has to decide whether or not to participate for each additional request.
From page 83...
... He shared a link to the Census Linking Project, which offers researchers information to create longitudinal datasets using historical Census data back from 90 years.2 He noted that the 1950 Census reaches 72 years in 2022, at which time those data will be released publicly. More recent data and linkages are currently available only inside the Federal Statistical Research Data Centers (FSRDCs)
From page 84...
... . Smeeding provided additional background and context about linking more recent Census data.
From page 85...
... According to Smeeding, following the work and report of the Commis sion on Evidence Based Policymaking, attention has turned to a National Secure Data Service (NSDS) , which would temporarily link these different data sources in a secure cloud environment for specific analyses, rather than creating a large linked dataset that would reside somewhere.
From page 86...
... 86 FIGURE 5-3  The American Opportunity Study backbone and opportunities for survey linkages. SOURCE: Timothy Smeeding workshop presentation, September 28, 2021.
From page 87...
... He said that these possibilities can enrich the current survey dataset by getting more data on the children or on the predecessors of the elderly. Bias Propensity to Inform Responsive and Adaptive Survey Design in a Longitudinal Study Andy Peytchev commented on the emergence of many interesting developments and innovations with longitudinal panels.
From page 88...
... Ninth-graders were originally recruited in schools in 2009, and information on these students was available from multiple sources, including their baseline interviews, follow-up surveys, and some administrative data from the schools. Strengths of this study, he said, were that they had measures of nonresponse bias based on the three different sources of information; they created a simulated control condition using propensity scoring, so that a sample of cases did not receive the experimental treatment; and they evaluated survey outcomes before and after the intervention phase rather than after multiple additional follow-up phases.
From page 89...
... . The first yellow bar in the figure shows the average absolute bias of 3.6 percent, which was averaged across the multiple estimates from the prior round data, baseline interviews, and administrative data.
From page 90...
... Peytchev concluded by saying that the treatment condition was effective in reducing nonresponse bias compared to the control condition for most estimates, regardless of whether follow-up data, frame data, or baseline data were used. The treatment condition reduced the average absolute bias by approximately one percentage point, or roughly about one-quarter of the estimated bias.
From page 91...
... He also pointed out that use of administrative data is becoming much more common in the research literature, and about half of data-driven papers in the American Economic Review now use some sort of linked data. He is seeing increased interest and directs researchers to the experts who are creating and making available the linked data.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.