Skip to main content

Currently Skimming:

Chapter 3 Contributed Session on Application of Record Linkage
Pages 47-78

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 47...
... -John Horm, National Center for Health Statistics 47
From page 49...
... This paper addresses some of the problems that analysts may face when they perform exact matches using a unique identifier. The paper deals, specifically, with records that have been linked by an exact match on social security number (SSN)
From page 50...
... Work to implement and review the SOI edits and prepare a panel file is only beginning, fiercer editing will take place over We next few monks. Problems Created by Incorrect SSNs Incorrect SSNs create a number of problems affecting not only record linkage and data capture but Subsequent analysis of the data.
From page 51...
... As a result, SSA's validation tends to be conservative, erring on the side of making too few matches rawer Tan making false matches. In editing the SSNs reported on tax panel records, the IRS staff employed a number of evaluation strategies.
From page 52...
... While incorrect SSNs will produce such breaks, most of Me occurrences are attributable to genuine change. Valitlating SSNs Against IRS/SSA Records In editing the SOCA and Family Panel files, SOI staff used an IRS validation file that contained fields obtained, ultimately, from SSA.
From page 53...
... Use of the Return Name Control Until filll names became available, Me only identifying information about a filer was the return-level name control, which is derived from the surname of the primary filer, which may differ from that of the secondary filer and one or more dependents. Testing for exact agreement between the return name control and any of the name controls on the validation file for the primary SSN, the secondary SSN, and any dependent SSNs could be automated easily and reliably.
From page 54...
... Strategies When Name Lines Were Not Available For the SOCA Panel, name lines did not become available until year four. Bird dates provided important alternative formation wad which to evaluate tibe secondary SSNs.
From page 55...
... It is striking, first of all, how closely the estimated error rates for primary and secondary SSNs match those of the much smaller SOCA Panel. Second, the error rate for all dependent SSNs is just over twice Me error rate for secondary SSNs.
From page 56...
... Finally, ~ want to encourage Me SOT Division to develop secondary name controls from the name lines that became available In 1988 and use these name lines to edit Me secondary SSNs in Me Family Panel. Secondary name controls denved by even a simple algorithm from Me full name line could substantially reduce Me subset of cases that are flagged as possibly containing incorrect secondary SSNs.
From page 57...
... is a representative 1 percent sample of the population of England and 1 Wales containing linked census and vital events data. The study was begun in 1974 win a sample drawn from the population enumerated at the 1971 Census using four possible dates of birth in any year as the sampling criterion.
From page 58...
... household members may therefore change over time. Routinely collected data on the mortality, fertility, cancer registrations, infant mortality of children born to LS sample mothers, widowker~hoods and migration of LS members are linked into the sample using the National Health Service Central Register to perform the link (Figure 2~.
From page 59...
... As in 1971, when the LS was created, the information printed on the index cards was used to locate He Census forms and the name and address were transcnbed from the forms. These cards were~en sent to NHSCR for matching against the LS alphabetical index.
From page 60...
... The matching and tracing process was easier and faster Han in 1981 as the cards were initially matched and traced against He NHSCR database entries rather Can manually against the two rooms Bill of index cards which formed He LS alphabetical index. Only if no previous LS number existed or 60 ~
From page 61...
... However, even allowing Hat LS-Census forward linkage rates were extremely good there were still approximately 10 percent linkage failures at each census. This problem of linkage failure was investigated using the NHSCR records to examine ~ percent samples of linkage failures as part of each of the LS-Census Link exercises.
From page 62...
... There are two methods of identifying vital events occurring to LS members - firstly, through routine notification of events to NHSCR, where the LS member is identified by the presence of an LS flag in the register; and secondly, through He annual vital events statistics files compiled by ONS. Some types of event, deaths and cancer registrations are identified using bow methods as a cross checking device Table 31.
From page 63...
... New birds into He sample, birds to sample m~ers7 infant deans of LS members children and widow(er) hoods are all identified using date of birth searches of the annual vital events statistics files.
From page 64...
... -- The Linkage Process 1 Routine not'fcat'~r~ , 1 . | Event notified to b3HSCR I | Registers searched | for LS nag 1 Flag present- listed on men tape Sancho ONS Annual ONS vital events files search 1 Files searched for LS date of vim averts ~d Listing sent to NH3~R checked against registers \ S member not found - delete entry from list ~ ,- -/ | ONS:inks ~n70rmabon IrPseturn listing to ONS | LS member [Found - add LS numberto list \ How Good Is the Linkage of Events?
From page 65...
... However, at present this would contravene all legal requirements including that of current UK data protection legislation. The LS is not a survey where an individual gives their consent for the use of personal data but a study where administrative data collected for other purposes is used to provide a rich source of socio-demographic and mortality data about the England and Wales population over time.
From page 66...
... The linkage methods used are partially computerised but because of legal restrictions much of the linkage is still labour intensive and reliant on He skills of ONS and NHSCR staff. Automatic linkage would be He ideal, but until it is legally feasible to electronically link He LS system to all over ONS systems (including He Census database)
From page 67...
... Data linkage provides MENSA, and the traffic safety community at large, wad a source of population-based crash and injury state data that include the medical and financial outcome for specific crash, vehicle, and behavior characteristics.
From page 68...
... This presentation will describe how linked data made it possible for NHTSA to conduct a medical and financial outcome study of the benefits of safety belt and motorcycle helmets using routinely collected, population-based, person-specific state data. Use of Linked Data to Standardized Non-Uniform Data for Analysis Outcome Analysis Using "As Reported" Data Measuring outcome is complicated when using "as reported" utilization data.
From page 69...
... ShiR in Severity: Separate effectiveness rates for each severity level were calculated Cost: and Den compared to measure the downward shift in injury severity Defined as inpatient charges because non-inpatient charges were not comparable among the seven states. Use of Linked Data to Expand Existing Data Identifying Injuries Not Documented by the Police Police are required to document only Dose crashes and injuries that occur on public roads and meet mandated reporting thresholds.
From page 70...
... Thus annual linkage of the crash and injury state data provides the states, NHTSA, public health and injury control, with a permanent and routine source of outcome information about the consequences of motor vehicle crashes at the same time Hat He quality of state data are improved for their originally intended purposes.
From page 71...
... is a large in-person health survey of Be United States population conducted annually by the NationalCenter for Health Statistics Lawson end Adams, 1987~. Health and heakh-related information is collected on approx~nately 122,000 persons per year (42,000 households)
From page 72...
... An indication of agreement between the user record and the NDI record is resumed to the user for each of the seven items involved in the twelve matching criteria. In addition to the items involved in the matching criteria the NDI returns an indication of agreemerd/disagreement between the user record and the NDI record on five additional items: age at death; race; marital status; state of residence; and state of birth.
From page 73...
... Frequency-based weighting schemes such as proposed by Fellegi and Sunter and Rogot, Sorlie, and Johnson are attractive since He rarer occurrences of a matching item is given more weight Han more common occurrences. However, the user is still leR wad He problem of properly classifying matched records into at least minimal categories of true matches, false matches, and questionable matches.
From page 74...
... Assignment of records falling into one of Classes 2, 3, or 4, as either true matches or false matches was made based on He score and cut-off points within class. Records wig scores greater Han the cut-off scores are considered tree matches while records wad scores lower than the cut-off scores are considered false matches.
From page 75...
... Among non-whites there are multiple problems including lower reporting of social security numbers and Incorrect spelling/recording of ethnic names. The correct classification rates for non-white decedents dropped to 86 percent while He classification rate for living persons remained 0th at over 99 percent.
From page 76...
... (1993~. Questionnaires from the National Health Interview Survey, 1985-89, National Center for Heath Statistics, Vital and Health Statistics, I(31)
From page 77...
... (1986~. Probabilistic Methods in Matching Census Samples to the National Death Index, Journal of Chronic Diseases, 39, 719-734.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.