Skip to main content

Currently Skimming:

Chapter 5 Contributed Session on Confidentiality and Strategies for Record Linkage
Pages 139-168

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 139...
... J WilZenborg, Statistics Netherlands Lawrence H
From page 141...
... ~F~ Lithe "data terrorist', .~ ~id a ~h ~ I., Is- t . ~-:~- =~e so.~.are w tha mneept~ level, tb; An r'/~/~o~l~,~: : .......
From page 142...
... SDC for Microdata at Statistics Netherlands Re-i`dentification The aim of statistical disclosure control (SDC) is to limit Me risk Rat sensitive Formation of individual respondents can be disclosed Tom a data set (Willenborg and DeWaal, 19961.
From page 143...
... When a respondent appears to be rare in We population win respect to a key value, then disclosure control measures should be taken to protect this respondent against reidentification (DeWaal and Willenborg, 1995a)
From page 144...
... A balance between global recoding and local suppression has to be found in order to make the information loss due to SDC measures as low as possible. It is recommended to start by recoding some variables globally until the nuinber of unsafe combinations that have to be protected by local suppression is sufficiently low.
From page 145...
... This latter aspect of ,u-ARGUS, determining Be necessary local suppressions automatically and optimally, makes it possible to protect a m~crodata set against disclosure - quickly. The Development of p-ARGUS As is explained above, a microdata file should be protected against disclosure in two steps.
From page 146...
... Global Recoding and Local Suppression If the number of unsafe combinations is fairly large, the user is advised to first globally recode some variables interactively. A large number of unsafe combinations is an indication Hat some variables in the m~crodata set are too detailed in view of the future release of He data set.
From page 147...
... In He event Hat He original data file reside on another computer, ,u-ARGUS will generate the necessary recoding information that will be used by a module of -ARGUS Hat runs on that over machine. The Development of -ARGUS Besides He development of -ARGUS for m~crodata sets, He SDC-Project also plans development of -ARGUS -ARGUS is aimed at He disclosure control of tabular data.
From page 148...
... . Global Recodings and Local Suppressions in hIicrodata Sets, Report, Voorburg: Statistics Netherlands.
From page 149...
... (19961. Set Covenog Models for Statistical Disclosure Control in Microdata, paper presented at Me 3r~ International Seminar on Statistical Confidentiality, Bled.
From page 150...
... . The program, since its inception, provided that each motor vehicle operator's own insurance carrier would provide coverage for personal injury protection (PIP)
From page 151...
... Data The police crash report file is maintained by the Hawaii State Department of Transportation. The four county police departments in Hawaii are required to report every motor vehicle collision on a public road which involves an injury or death or estimated damages of $1,000 or more (in 1990)
From page 152...
... The log frequency distribution of the Felleg~-Sunter match weights for Pass ~ is shown In Figure 1. For Pass I, 2,565 record pairs were designated as matches, with match weights meeting or exceeding the cutoff value.
From page 153...
... The profiles for time distribution by hour, urbanlrural location, and daytime and nighttime peak traffic penods showed significant, but low level differences (phi coefficients generally <.02~. Gender, human factor, and police judgements of injury severity differed substantially across the two files, with Me matched insurance file being more seriously injured (57% of insurance claimants denoted "not injured" versus 74% of the police report file)
From page 154...
... There are slight differences in police-reported seatbelt use for chiropractic and physician service users, with 97% of the chiropractic users rep orUng belt use, and 95% of MD-only users report having been belted doling Me crash, as shown in Table 3. The crash report belt use rate is higher than previous independently observed belt use rates in the 80% range.
From page 155...
... 3.90 96.10 100.00 The distribution of users of chiropractic versus MD - nly care differs in a number of ways across types of crashes. Table 4 illustrates Me distribution of care choices across crash types commonly considered "at fault." (The "at fault" drivers are identified as those string another car, and Dose involved in rollovers.)
From page 156...
... Police Injury Seventy No injury Possible injury Non-incapacitating injury Incapacitating injury Fatality All Chi-square = 82.21,4 df, p < .001 Table 5.-Chiropractic and Physician Office Visits by Police Reported Injury Severity l Chiropractic Use MD Only | Totals l N | % |N | % |N | % 853 63.70 447 46.13 1,300 56.33 252 18.82 231 23.84 483 20.93 211 15.76 243 25.08 454 19.67 22 1.64 43 4.44 65 2.82 1 0.07 5 0.52 6 0.26 1,339 100.00 969 100.00 2,308 100.00 Crash type affects the distribution of care choices also, as shown in Table 6. Sixty-two percent ofthe chiropractic service users were involved in rear-end collisions, while only 54% of the MD-only users were involved in rear-end collisions.
From page 157...
... Table 7.-Chiropractic and Physician Office Visits by Human Factors Human Factors | Chiropractic Use | MD Only T Totals | N | % | ~ ~% T N ~% Inattention ~20.92 ! 33 21.47 564 21.15 Misjudgement190 12.01 100 9.22 290 10.87 Fatigue22 1.39 38 3.50 60 2.25 Alcohol21 1.33 23 2.12 44 1.65 Cipher70 4.42 61 5.62 131 4.91 None948 59.92 630 58.06 1,578 59.17 All ~1,582 100.00 1,4185 100.00 2,667 Chi-square= 22.17, 5 df.
From page 158...
... use MD and chiropractic services more frequently Man Dose "at-fault;" the use of chiropractic services is substantially higher Tan We use of MD-only services among occupants wad low seventy police reported ~njunes; those who commit what might be seen as the most serious driving errors In Me course of a collision (driving on Me wrong side, ignoring traffic controls, speeding) are less likely to use chiropractic care Man Dose who commit no errors or more minor errors (e.g., following too closely, inattention, misjudgment)
From page 159...
... (19941. Analyzing the Relationship between Crash Types and injuries in Motor Vehicle Collisions in Hawaii, Transportation [Research Recordt, 1467, 9-13.
From page 160...
... The use of logistic regression was chosen as the method of estimating the proportion of true links in the linkage of the 1996 Reverse Record Check (RRC96) with the 1990 Revenue Canada files (RCT90)
From page 161...
... It is by means of these addresses that we can begin tracing Me selected persons by Me RRC. During operations to fink the RRC sample with the 1990 Revenue Canada files, we determined, for each of the eight region-by-sex groups, a threshold linkage weight beyond which all links were considered definite or possible and were retained for the next stage.
From page 162...
... These two steps (choosing UPP and checking) were repeated until He rejection rate forge links checked seemed lower then 10% for links with a linkage weight close to UPP.
From page 163...
... Using this data set, we were Men able to assess Be reliability of We accepted links. Definition of the Reliability of a Link The probabilistic linkage procedure consists In calculating, for each pair of records, a weight W based on whether He fields compared match or do not match and on the probability of matching these fields given a linked pair or an unlinked pair.
From page 164...
... We could estimate these two probabilities respectively by the proportion of linked pairs among the accepted links and the proportion of linked pairs among the rejected links. These estimates would require manual checking of two samples drawn respectively from He accepted links and He rejected links.
From page 165...
... = -2.82 + 0.0165 W Ontario Western logit(p) = -12.70 +0.0665 W 86.6% 98.8% 224 For Eastern and Ontario regions, we didn't find enough unlinked pairs to do a logistic regression.
From page 166...
... In that situation, using the memos described here could prove to be ineffective or even discouraging, since We reliability calculated by means of logistic regression is a lower boundary for the reliability of the accepted links. ~ some cases, that boundary could be very low although Me overall rate of false links is acceptable.
From page 167...
... could also be used on a sample of links checked during the linkage procedure, so as to determine tJPP and LOW points Mat result in bow an acceptable {eve! of reliability and a reasonable amount of manual checking, or even to choose to change We linkage rules if we suspect that it wiB not be possible to achieve these two objectives simultaneously.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.