Skip to main content

Currently Skimming:

Chapter 7 Contributed Session on More Applications of Probabilistic Record Linkage
Pages 201-234

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 201...
... Grabowiecki, Statistics Canada Kenneth Robertson, Larry Huff; Gordon MikkeZson, Timothy Pivetz, andlAZice WinkZer, Bureau of Labor Statistics San Dora Johnson, National Highway Tragic Safety Administration Eva MiZZer, New Jersey Departmen ' of Education 201
From page 203...
... Consequently, the most reasonable process involves matching directly all the CCR patient records that have ~is information, and then us~ng probabilistic record linkage in an attempt to couple the remain~ng records that could not directly match. It is our belief that tihis max~mises the rate of association between the two files while reducing the processing cost and t~me.
From page 204...
... Finally, death clearance essentially completes the information on cancer patients by furnishing the official date and cause of their dead. It involves direct matching and probabilistic linking cancer patient records to death registrations at the national level.
From page 205...
... PTCRs can obtain this information by doing their own death clearance, using local prov~nciaVtemtor~al files of death registrations. Patient records having responses for all Tree key fields first pass Trough a direct match with He CMDB In an attempt to find mortality records with identical common identifiers.
From page 206...
... The dead information of linked CMDB records is posted onto Me CCR patient records, overIay~ng any previously reported data In these fields. The linked pairs and unlinked CCR patient records join the matched pairs In proceeding to the post processing phase of dead clearance.
From page 207...
... Consequently, the patient files sent to the CCR by this registry never contain complete death information. Therefore, no cancer patient record from Quebec can obtain a confirmation of death by means of the Direct Match process; all Quebec records participate in the Probabilistic Linkage.
From page 208...
... 84,926 B.C. 33~103 8,058 360 8~418 25.4 8~367 25.3 Total 118,029 30,706 1,543 32,249 27.3 32,037 27.1 It is evident that in terms of the number of pairs obtained in the end, one can expect little difference between the two methods of death clearance.
From page 209...
... Provinces With Direct Match and Probabilistic Linkage Process For this part, the complete death clearance system is used to process the data of He three selected prov~nces. It we automatically produce death confirmation pairs by using He Direct Match and the Probabilistic Linkage for British Columbia and Ontario.
From page 210...
... The advantages of the DM-PL memos include lower operating costs to perform death clearance (increased efficiency) , and greater certainty ~ the results (minimum manual review of cancer-mortality record pairs by PTCRs)
From page 211...
... (1997~. Canadian Cancer Registry, Death Clearance Module Overview, Statistics Canada ('ntemal document)
From page 212...
... . explanation of the current linkage procedures, details of the work completed to date, and areas of research that need to be explored in the future.
From page 213...
... We follow the administrative code match with a probabil~y-based match. This procedure is followed to identify He small percentage of links which are missing He appropriate administrative codes.
From page 214...
... Although these areas affect only the four percent of He records mentioned above, the net effect on the number of births and deaths iclentif~ed could be significant. New Approach The matching process consists of He two major procedures described below -- an administrative code match and a probabilky-based weighted match.
From page 215...
... Probability-Based Match The probability-based weighted match process involves only the unmatched records from the administrative code match process. In this process we generally expect to match less than one-half of one percent of the current quarter records.
From page 216...
... The more accurate we can make Me overall linkage process, the more useful Me database wall be in identifying economic occurrences. Theoretica/F Basis for Weighted Matching The weighted match process is accomplished using Me software packages Auto Stan and Automatch, Tom Matchware Technologies Incorporated.
From page 217...
... It is, therefore, important Hat we find match cutoff parameters for each block which produce satisfactory results in all States. The results were evaluated oy me tour analysts using me same rules used m evaluating the Number of Units Matched Table ~ provides a summary of the matches in California which were obtained from the current matching procedures and the new procedures as tested.
From page 218...
... 100.0 100.0 Note that the new system identifies 1, 642 more links than the old system. finally, the Bird improvement In He matching process is In the weighted matching for ah units In both quarters which do not match during any of the administrative matching procedures.
From page 219...
... The second is that although we are not identifying these matches during the weighted match, they may be identified in the enhanced administrative matching procedures which would preclude them Tom Me weighted matching process. The truth may lie somewhere between these possibilities and wall be one focus of our figure research efforts.
From page 220...
... This rough balance in these error Types seems a reasonable one for the purposes for which we are matching the files. Since there are only 142 good or questionable matches which fan billows match cutoff parameter' it seems that a substantial portion of Me weighted matches identified only by the current weighted match procedures are identified during the enhanced new administrative match procedures.
From page 221...
... This initial review of the matching process using the final parameter values will provide some measure of the quality of matches obtained. It may also be advantageous to tailor the match cutoff parameters independently for each State.
From page 222...
... Six of the seven states linked person-specific crash data statewide to EMS and hospital data. The EMS data facilitated linkage of Me crash to the hospital data because they included information about Me scene (pick-up)
From page 223...
... Linkage rates varied according to the type of data being linked. In each of the CODES states, about 10% of the person-specific police crash reports linked to an EMS record and slightly less than 1.8% linked to a hospital inpatient record, a reflection of the low rate of EMS transport and hospitalization for crash injuries.
From page 224...
... The false positive rate ranged from 3.0 - S.8 percent for the seven states and was viewed as not significant since the linked data included thousands of records estimated to represent at least half of all persons involved in motor vehicle crashes in the seven CODES states. False positives were measured by identifying a random sample of crash and/or injury records and reviewing those that linked to verify that a motor vehicle crash was the cause of injury.
From page 225...
... False negatives were also identified by randomly selecting a group of crash reports and manually reviewing Me paper records to identify those which did not link. Crash and injury records failed to match when one or the Over was never submitted, We linking criteria were too restrictive, key data linkage variables were in error or missing, the case selection criteria, such as the E-code, were in error or missing, the crash-related hospitalization occurred after several hours or days had passed, the crash or the treahnent occurred out-of-state, etc.
From page 226...
... They used We linked data to identify issues related to roadway safety and EMS, to support safety legislation, to evaluate the quality of their state data and for older state specific purposes.
From page 227...
... :~.. tee The New Jersey Depanrnent of Education has undertaken a record linkage procedure involving use o computers in the deterministic matching of student records to follow the progress of New Jersey's public school students in meeting the state standardized graduation test -- the High School Proficiency Test (HSPT)
From page 228...
... On first glance it would seem that New Jersey Department of Education's records linkage task is an easy and straightforward one. Since in October 1995, 62,336 eleventh grade students were enrolled In regular educationalprograms In New Jerseys public schools and 51,601 (or 82.~%3 of these students met the HSPT testing requirement on their first testing opportunity (also includes eleventh grade students who may have met the requirement in one or more test sections while categorized by their local educators as "retained teeth grade" students)
From page 229...
... The local educator maintains primary responsibility related to the validity of the infonnation by: assuring Me accuracy of identifier information about individual students, reviewing reports sent to them to assure the accuracy and completeness of information about their enrolled and tested student population; and the responsibility to ascertain that every enrolled student is listed on the school's roster once and only once! At its inception In October 1995, the cohort tracking project was intended to follow a defined population of eleventh grade students forward to their anticipated graduation (~e static cohort)
From page 230...
... +EY2, ~Y51 Y3, Y6 N = fall enrollment y = in and out = test administration, where ~ = 1st test administration for cohort 2 = 2nd test administration for cohort Varies by administration (i) y' = in from out of state or private school Y2 = in from within state( public school)
From page 231...
... A critical element in the assurance of the validity of the correct identification of each enrolled student as well as pass/fail indicators (for each test section and the total test requirement) is the review of the static roster immediately following the fall eleventh grade test administration.
From page 232...
... The record change process is an opportunity for correction of erroneous data related to permanent student identifiers (name, date of birth, and gender) , personal status identifiers (school enrollment, grade, participation in special programs (such as Special Education, Limited English Proficiency programs, and Title b, and test 232 ~
From page 233...
... would be most appropriate in determining a proportion of the total population to be tested for a cohort year. Another approach, however, would be that record changes as they relate to the cohort tracking project, should be segmented and the number of record changes for the students who had one or more test sections yet to pass after the October test administration would be useful; however, Mat statistic is not readily available.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.