Skip to main content

Currently Skimming:

Appendix B: Matching Records Across Databases
Pages 65-78

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 65...
... The election officials submit the verification query to the DMV, which may involve a driver's license number or an SSN4. If the query involves SSN4, the DMV passes the request to the SSA using the AAMVAnet, a private network 1 For an overall background document that covers many elementary aspects of matching records (that is, record linkage)
From page 66...
... For example, if the voter registration database is being checked against a database of felons or dead people, a low rate of false positives is needed to reduce the likelihood that eligible voters are removed from the VRD. Just how low a rate is acceptable is a policy choice.
From page 67...
... Such information can easily be computed from either state-held databases (such as the department of motor vehicles (DMV) or voter registration databases, whichever is of higher quality as indicated by fewer 5 An example of such a problem was a case with a record-level match conducted to identify felons in the voter registration database in Florida before the 2000 election.
From page 68...
... [are] as follows: ChoicePoint will identify all matches on the comprehensive list resulting from the processing described in Paragraphs A.2-A.7 that do not match based on all of the following data fields: • Validated 9 digit Social Security Number • Non-normalized (i.e., as name appears in original source data)
From page 69...
... Once it is known that an application is not a duplicate, and not just a change of address or party, the application needs to be checked against the relevant databases. Table 12, "Verification of Applications," on page 72 in the EAC report11 shows that each state has its own unique set of criteria for verifying the applications, ranging from states like Pennsylvania, which verifies only through the DMV and the SSA, to Montana, which verifies against the DMV, the SSA, Vital Records, "Match Against Voter Registration Databases," "Tracking Returned Voter ID Cards," "Tracking Returned Disposition Notices," and "Verify Through Other Agency." According to Table 13, "Data Fields for Comparison to Identify Duplications," in the EAC report, 15 states verify using the address; 48 verify the date of birth; 38 verify the driver's license number; 46 verify the names provided by the registrant; 40 verify "Social Security number" (although surely that is just the last four digits in most cases, since according to Table 11, pages 68-69, in the EAC report, only 7 states use the full SSN)
From page 70...
... Upon receipt of the applicable information, the SSA queries its database and returns one of five responses: no match found; one unique match, death indicator absent; one unique match, death indica tor present; multiple matches found with at least one lacking a death indicator; or multiple matches found but all with death indicator. As noted above, the query is based on searching for exact matches on the applicable information.
From page 71...
... : New Registration Card SSA Record Tom T Bowden Taylor T Bowden 3121 Escondido Way 11/04/77 11/04/77 SSN 000001087 SSN 000001087 In this case, the SSA would return a response of "no match found." However, if the voter registrar could determine that either Tom has a middle name of Taylor or Taylor has a middle name of Tom or Thomas, then this registrar could associate these records with some degree of confidence if he or she concluded that the first and middle names have been transposed. But in the absence of other informa tion, the registrar has no way to make such a determination.
From page 72...
... Winkler, "Using the EM Algorithm for Weight Computation in the Fellegi-Sunter Model of Record Linkage," Proceedings of the Section on Surey Research Methods, American Statistical Association, pp.
From page 73...
... Winkler, "Automatic Estimation Record Linkage False Match Rates," Proceedings of the Section on Surey Research Methods, American Statistical Association, CD-ROM. Also available at http://www.census.gov/srd/papers/pdf/rrs2007-05.pdf.
From page 74...
... Winkler, "Automatic Estimation Record Linkage False Match Rates," Proceedings of the Section on Surey Research Methods, American Statistical Association, CD-ROM, 2006, also at http://www.census.gov/srd/papers/pdf/ rrs2007-05.pdf; Thomas R Belin and Donald B
From page 75...
... Winkler, "Improved Decision Rules in the Fellegi-Sunter Model of Record Linkage," Proceedings of the Section on Surey Research Methods, American Statistical Association, pp.
From page 76...
... Because the first character of the surname typically is less likely to be in error (or is assumed to be so) , this criterion is insensitive to some basic kinds of typographical error, e.g., "Smith" versus "Smoth." For each of these pairs, a matching score is computed using the rest of the information in the available data fields.
From page 77...
... Cohen, Pradeep Ravikumar, and Stephen E Fienberg, "A Comparison of String Distance Metrics for Name-Matching Tasks," Proceedings of the ACM Workshop on Data Cleaning, Record Linkage and Object Identification, Washington D.C., August 2003.
From page 78...
... A more general strategy would be needed when there is a possibility of typographical error in every field. The matching strategy is to search the entire file and apply suitable proximity metrics that indicate that the UID, first name, last name, and date of birth are sufficiently close to the query record.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.