Skip to main content

Currently Skimming:

Chapter 2 Invited Session on Recorded Linkage Applications for Epidemiological Research
Pages 13-46

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 13...
... CharIfon, DC Applications tohn Van Voorhis, David Koepke, and David Yu University of Chicago 13 ~
From page 15...
... Newcombe (Newcombe et al., 1959; and Newcombe, 1967, 1987, and 1988) undertook the pioneering work on medical record linkage in Canada in Me 1950's and Rereader, Acheson (1967, 1968)
From page 16...
... These, considered individually, are partial identifiers and matching depends on their use In combination. Unique Personal Identifiers Personal identification, administrative and clinical data are gradually accumulated during a patient's spell in a hospital and finalized into a single record.
From page 17...
... File Ordering and Blocking Metching and linkage in established datasets usually involves comparing each new record with a master file containing existing records. Files are ordered or blocked in particular ways to increase the efficiency of searching.
From page 18...
... Further subdivision of He ONCA blocks once file can be effusing sex, forename initial and He of bird either singly or in combination. ORLS File Blocking Keys and Matching Variables The file blocking keys used forge ORLS are general Ante foDow~ng fashion: 18 The primary key is general using He ONCA of He preset surname.
From page 19...
... Generating Extra Records Where a Number of Name Variants Are Present To ensure ~ He data record can match with He blocks c~ng all possible varies of the names infix tion, multiple records are generated on He master fee coining combinations of present and bird surnames, and forenames. To iDus~e the generation of extra records Were He ides set for a person captains many Brims ofthe names, consider the following example: birth surname: preset surname (marred surname)
From page 20...
... Use of a stream number on each record enables selective matching to be undertaken, for example data records can be matched with Me master file and with each opera but Me master file records are not matched with themselves. Considerable work has been undertaken to develop methods of calculat~geheprobabilky~atpairsofrecords, c~tainmg aIrays of partial identifiers which maybe subject to error or variation in recording do, or do not, relate to the same person.
From page 21...
... A detailed discussion of match weights and probability matching can be found in publications by Newcombe (Newcombe et al., 1959, and Newcombe, 1967, 1987, and 1988) , and by Gill and Baldwin (1987)
From page 22...
... When He matting item is preset on both He records, a weight is calculated expressing He amount of agrees ment or disagreement between the item on He data record andante corresponding item on He master file record.
From page 23...
... It is possible for Me calculated weight to become negative Here Were is extreme disagreement between Me item on the cola record and He corresponding item on the master file. In maying street address, postcode and generalpractitioner~e score cannot go negative, although it can assume zero, because the individual may have changed their home address or their family~octor sincerity were last entered income system, this is ready a change in family circumstances and not errors in He cola and so a negative weight is not justified.
From page 24...
... The false positive and false negatives are very sensitive to the threshold cut-off weighs: too low gives a very low false positive rate and a high false negative r - ; too high gives a high and unacceptable false positive rate with a low false negative rate. The values selected for the threshold cut-off are, of course, arbitrary, but must be chosen with care having considered the following objectives: · The mumnisabon offalse positives, at Me risk of increased missed matches; · The minirnis~m of missed matches, at Me risk of increased Vise positives; and · The Ir~ni~on of Me sum of false positives and missed matches.
From page 25...
... Precise scores and probabilities may vary according to the population and record pairs studied. A number of matrices have therefore been prepared for the different types of event pairs being matched, for example, hospital to hospital records, hospital to death records, birth to hospital records, hospital and Distnct Health Authority (DHA)
From page 26...
... At the end of each computer run, the results of Me clerical scrutiny are pooled with aD Me existing matching results and new matrices are prepared. 'The requirement is to reduce Me "Q" zone to the minimum consistent with Me Contras of minimum false positives and false negatives.
From page 27...
... Record Linkage Techniques-1997 Figure 3. -- A Sample Portion of the Matrix Used for Matching Hospital Records with Hospital Records 30 NN~*
From page 28...
... . The number of records written to the output file for any one person can be very large, and is approximately number of records on data file multiplied byte number of records on the master file.
From page 29...
... Where ~ere are records for a women recorded under her maiden name (A) , and records ~at captain details of bo~ her maiden and marned name 033 and just her marned name (C)
From page 30...
... Secondly, records belong~ngto the same persona have not been Rougher, i.e., reside Me file under two or more dii3trent person identifiers, these are known as '~false negatives or missed matches." The false positive rate was Snared using two different methods. Firstly, all Me records for a random sable of 5,000 people haying two or more records were extracted Comae ORLS file and printed out for clerical scrutiny.
From page 31...
... (1993~. Computerised Linkage of Medical Records: Me~odolog~cal Guidelines, Journal of Epidemiology and Community Health, 47, 316-319.
From page 32...
... (1977~. Selection ofa Surname Encoding Procedure for the Statistical ReportingService Record Linkage System, Washington, DC: United Stakes Deparl~llent of Agriculture.
From page 33...
... . Knuth-Monis-P~ AIgor~h£n, in: String Searching Algorithms, Singapore: World Scienbfic Publishing Co.
From page 34...
... Some results will be presented by way of example. We will show how the complex linkages required for statistical analyses can be decomposed into a sequence of simple database queries and linkages.
From page 35...
... . The derived files may be former linked to files In or outside He date set (E)
From page 36...
... "Patients consulting" is therefore a subset of the practice list of all registered patients. Consultations must be carefully distinguished from 'patients consulting." A combination of patient number, date and place of consultation and diagnosis uniquely define each record in the consultation file.
From page 37...
... Individual practice staff consulted are identified in Me consultation file by a code. Patients: Patient number; age; sex; post code; socio-economic data Primary key: Patient number Foreign key: Postcode references geographic data These data were stored as four separate files relating to: all patients; adult patients; children; married cohabiting women, because different information was collected for each subgroup.
From page 38...
... Derived files: The MSGP database contains information on individual patients and consultations. To make comparisons between groups of patients, end to star~dardise the data (e.g., for age differences)
From page 39...
... Extracting just He patient identification numbers Tom this dataset, and eliminating duplicates, results in a list of patients who consulted for diabetes at least once during He year. This subset of He consultation file can be linked wig He original consultation file to produce a derived file containing He consultation history of all diabetic patients in the study, which can be used for furler analysis.
From page 40...
... In this slightly more complex situation it is necessary to create a lookup table containing the diseases of interest and their ICD codes and link this to the "consultations by diabetic patients" file to create a furler subset of Me consultation file containing consultations for diabetes and its complications. It is likely that this file as well as the simpler one described above would be linked to the patient file to include age and sex and other patient characteristics before analysis using conventional statistical packages.
From page 41...
... User-Friendly Linkage Software The MSGP4 practice software was originally written so that participating practices could gain access to the data collected from their own practice. The software was designed to be used easily by people with no knowledge of database technology and because the software runs directly under DOS or Windows, no specialised database software is needed.
From page 42...
... Firstly, database software creates large temporary files of cross products, which is time consuming and may lead to memory problems. Secondly, queries involving complex linkages are often difficult to formulate and may easily turn out to be incorrect.
From page 43...
... Denominators can be consultations, patients consulting or patient years at nsk. Figure 3.
From page 44...
... They also facilitate linking in new data from other sources. However most statistical analyses require simple rectangular files, and complex database queries may be required to obtain these.
From page 45...
... ~ Tips and Techniques for Linking Multiple Data Systems: The Illinois Department of Human Services Consolidation Project John Van Voorhis, Davi~Koepke, an~Davi`! Yu University of Chicago =~ ~ ..., .


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.