Integrated Data System Person Identification: Accuracy Requirements and Methods

24 Pages Posted: 22 Oct 2014

See all articles by Ting Zhang

Ting Zhang

University of Baltimore

David W. Stevens

University of Baltimore

Date Written: February 20, 2012

Abstract

This report responds to a Workforce Data Quality Initiative (WDQI) challenge — the unreported quality of person identification (PI) features in many integrated data systems (IDS) that link confidential workforce, education and social services administrative records.

The importance of the PI topic reflects concern that many local K-12 education agencies do not collect student Social Security Numbers. Some conclude from this widespread omission that linkage of secondary student records with workforce data may be impossible. However, others have adopted ad hoc and commercial software solutions to bridge this gap. To date no standard record linkage method has been endorsed.

Will performance dashboards and research findings based on IDS information be accepted as trustworthy by individuals making important appropriation of funds, policy and program-level resource allocation decisions? Should IDS public-use releases be believed and acted upon?

A standard technical language is used in professional communication about PI topics. Record linkage can be pursued using exact matching or statistical matching. Within the exact matching portfolio are deterministic and probabilistic methods. And within the deterministic portfolio are direct and hierarchical methods.

A familiar first step among WDQI award teams is application of exact matching when two or more administrative data files each contains a SSN field. This first step is also the last step in some record linkage actions, which introduces selection bias threats, singly or in various combinations. Confirmation that a SSN has been issued, and is therefore valid, does not mean that the valid nine-digit SSN was issued to the person associated with this SSN in one or more administrative data files.

We completed a series of three record linkage steps: (1) determine what candidate identifiers are available in each administrative data set; (2) use Link Plus software to carry out multiple deterministic and probabilistic PI diagnostics; and (3) examine the potential matched pairs identified in step two, assigning each pair to one of three categories — match, non-match, or uncertain match. Our intent has been to illustrate typical PI accuracy challenges that are found in administrative data files. These challenges occur over time within a single administrative data source and among different administrative data files.

Our diagnostic findings are not amenable to summary coverage. Sections 5 and 6 describe what steps we undertook and what we found. Given our diagnostic findings to date: So what? If left unresolved, can a PI of unreported and perhaps unknown quality translate into unacceptable deficiencies in information, conclusions and recommendations that are released to stakeholders making important decisions about appropriation of funds, policies and program-level priorities?

PI accuracy is a necessary first step for successful integration of multiple administrative data sources. This is a universal requirement that applies to any and all attempts to link unit-record person specific administrative data sources.

Avoidance of stakeholder skepticism — rejection at worst — is within our collective control, but we need to take positive steps now to retain this control. Lost confidence is difficult to recover. We need to be out in front of this potential threat to realization of the return on past, current and future IDS investments.

We are not aware of an ongoing serious and sustained professional conversation about the criteria that are appropriate to define PI accuracy tolerances for specific applications. This conversation is needed because the community of practitioners does not know whether we are over- or under-investing in PI technologies and applications.

We encourage the U.S. Department of Labor, Employment and Training Administration WDQI leadership team to propose an appropriate forum — perhaps through the technical assistance resources of Social Policy Research Associates — to ensure immediate attention to the PI accuracy topic.

Suggested Citation

Zhang, Ting and Stevens, David W., Integrated Data System Person Identification: Accuracy Requirements and Methods (February 20, 2012). Available at SSRN: https://ssrn.com/abstract=2512590 or http://dx.doi.org/10.2139/ssrn.2512590

Ting Zhang (Contact Author)

University of Baltimore ( email )

1420 N. Charles St.
Baltimore, MD 21201-5779
United States

HOME PAGE: http://https://sites.google.com/site/tzhangphd/,

David W. Stevens

University of Baltimore ( email )

1420 N. Charles Street
Baltimore, MD 21201
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
43
Abstract Views
770
PlumX Metrics