Join CSTE   |   Career Center   |   Print Page   |   Contact Us   |   Report Abuse   |   Sign In
Tribal Data Linkage Toolkit
 
Tribal Data Linkage Toolkit Introduction
Record linkage is the process of comparing records across data sets to identify individuals contained in both. In Indian Country, one common example involves taking a data source with accurate information about American Indian/Alaska Native ancestry and linking it with a second dataset to improve the quality of race coding in the second database.
  • How can I get started?
    There are several user-friendly linkage tools available. This step-by-step information makes it easy to get started on how to do linkages using the software we are most familiar with, Link Plus. The tools section of this toolkit has more information.
  • Why record linkage?
    "When we did the first linkages, we were floored: 59% of American Indians and Alaska Natives in our cancer registry had no indication of this native heritage in the database."
    - Richard Leman, Medical Epidemiologist, Oregon Public Health Division

    "Our work directly benefits both state partners and tribes by improving the accuracy of race data in the state surveillance data systems and providing more accurate and complete health status data to northwest tribal communities."
    - Megan Hoopes, Project Director, IDEA-NW/Registry, Northwest Portland Area Indian Health Board

    "Improving the accuracy of racial data is important to all disease programs because of the disparities and inequities related to race. These disparities have been persistent and, in order for programs to address them, accurate racial data is crucial."
    - Tracy Miller, State Epidemiologist, North Dakota Department of Health
Overview
  • What is record linkage?
    Record linkage is the process of comparing records across data sets to identify individuals contained in both. Linkages can supplement or validate data across datasets and can identify duplicate records on the same individual within one dataset. Common examples include:
    • Merging death information from a vital statistics file with cancer information from a central center registry
    • Linking data from death certificates, inpatient hospitalizations, and law enforcement citations to generate crash and injury reports as in NHTSA's CODES Program
    Likewise, the detection of duplicates is a fundamental requirement for accuracy and validity of event counts in any disease registry.

    Linkages fall into two main categories: deterministic and probabilistic. Deterministic linkage compares data fields to look for exact matches across data fields of a record. While this process is fairly straightforward, it may result in many missed matches if there are coding errors or missing data. Probabilistic linkage estimates the probability that two records are for the same person. This process allows for differences between the two files, such the use of nicknames, middle initials versus full middle names, and transposed digits in a social security number.
  • Why pursue record linkages?
    • "At Oregon Public Health Division, we think of linkages as part of regular, sound public health practice. We’ve done linkages between the Northwest Portland Area Indian Health Board’s Northwest Tribal Registry and our state cancer registry annually since 1999 to improve the quality of data for American Indians and Alaska Natives in our state. When we did the first linkage, we were floored: 59% of American Indians and Alaska Natives in our cancer registry had no indication of their native heritage in the database. In our two most recent cancer registry linkages, we still found that an average of 28% of new cases among American Indians and Alaska Natives were not correctly identified in our database.
      We’ve also worked with the Health Board to pursue linkages of the state vital statistics and HIV/AIDS databases, among others. These linkages provide a more accurate picture of Indian health in Oregon. This helps both us and our partners in tribal and urban Indian health to recognize disparities and to better promote health among Oregon’s Native population.”
      - Richard Leman, Medical Epidemiologist, Oregon Public Health Division

    • "Tribal health leaders have long recognized the necessity of having complete and accurate race data as a first step to addressing health disparities experienced by American Indians/Alaska Natives (AI/AN) and other racial minority populations. Numerous studies have shown high prevalence of race misclassification for AI/AN in data sources such vital statistics and cancer registries. This results in underestimated morbidity and mortality, hampering public health decision-making and the appropriate allocation of disease control resources.
      Using the most complete listing of AI/AN currently available – a roster of individuals who have registered at tribal, Indian Health Service, and urban Indian clinics in the northwest – we perform record linkages with health data systems in Idaho, Oregon, and Washington. The prevalence of misclassified and missing race data in this region can range from 30-60%, which if left uncorrected, would significantly underestimate the burden of health outcomes for this population. Our work directly benefits both state partners and tribes by improving the accuracy of race data in state surveillance data systems, and providing more accurate and complete health status data to northwest tribal communities. To date, linkages have been conducted with state cancer registries, death records, hospital discharge data, STD surveillance systems, and several tribe-specific projects. This work is widely supported by tribal health leaders and our state partners.”
      - Megan Hoopes, Project Director, IDEA-NW/Registry, Northwest Portland Area Indian Health Board

    • "From the state health department perspective where data linkages have not occurred, improved racial and tribal data would help further identify disparities associated with our tribal communities and their members. In North Dakota, the quality of the data in the race demographic field is very poor for some of the infectious diseases, while others have better race information. Last year, almost 87% of the data for influenza and 40% of pertussis data was of unknown race, while diseases such as HIV and TB had complete race data, where no cases were classified as unknown or left blank.
      The North Dakota Statewide Cancer Registry (NDSCR) uses Link Plus to link hospital, vital records and out-of-state central cancer registry data files to our system. We also use Link Plus to link with the state Breast and Cervical Cancer Program. As required by the National Program of Cancer Registries and the North American Association of Central Cancer Registries, we participate in annual linkages with national Indian Health Service data to update race misclassification of American Indian/Alaska Native people. We do not change race codes in our cancer database as a result of these linkages but we do retain linkage-identified race information in a specific field, which allows us to account for these matches when we run statistics requiring AI/AN data.
      In Chronic Disease, the health department relies heavily on vital records and BRFSS data. There is a concern that race may not be accurately reported in death records. This may be more of a training issue for the people who complete death certificates than anything else. Since race is self-reported in BRFSS, I have no concerns with that data source. We also work with health claims such as Medicare and Blue Cross Blue Shield of North Dakota (BCBSND). CMS has made efforts to improve accuracy in race reporting in the Medicare system. BCBSND does not collect race information. Improving the accuracy of racial data is important to all disease programs because of the disparities and inequities related to race. These disparities have been persistent and in order for programs to address them, accurate racial data is crucial.
      Indicators at the national level and in our surrounding states indicate disparities among tribal populations. However, because some divisions in the health department do not get quality race data, it is difficult to identify the true impact to our tribal populations. The goal of linkages would be to not only improve our infectious disease data but to look at data quality across the whole health department.”
      - Tracy Miller, State Epidemiologist, North Dakota Department of Health
Tools
  • Getting Started
    An important first step is deciding whether to do deterministic or probabilistic linkage. Deterministic linkage is more straightforward, but it may miss many real matches because of the strict matching requirements, decreasing the ability to detect and correct misidentification. Probabilistic matching has several advantages over exact matching methods, such as the ability to:
    • Account for both the likelihood that two records represent the same person (sensitivity), and the likelihood that they do not (specificity)
    • Account for coding errors, missing data, reporting variations, and duplicate records
    • Assign score weights depending on the frequency of a value (e.g., your dataset contains many "Smiths” but few "Hoopes” so a match on "Hoopes” would be weighted higher)
    • Allow for phonetic name matching (e.g., NYSIIS and Soundex)
    • To view a presentation with more detailed information about linkage concepts, click here.
  • What linkage tools are available?
    There are several user-friendly software options available that require very little programming knowledge. This list is only a sample of the programs available and is not meant to be exhaustive.
    • The Link King is free public domain linkage and de-duplication software (user manual available).
    • Link Plus (a component of Registry Plus) is free, publicly available linkage and de-duplication software designed by CDC for use by central cancer registries (but usable with any fixed width or delimited data type). No user manual is available, but technical support is available by phone and email and an instructional PowerPoint about using Link Plus is also available.
    • LinkSolv is a commercial linkage solution software for purchase from Strategic Matching. Training and technical support are available.
Sample Documents
The process of pursuing record linkages varies across states, departments, and institutions, but here are some tools to get you started. First, it is important to contact the manager of the data source with which you wish to link to discuss the project and determine specific approval processes that the organization may have.
  • A simple IRB protocol is often developed, which can qualify for expedited review, and a data-sharing agreement is negotiated with the linking agency. Confidentiality pledges can be used to specify data handling and disclosure protocols required of staff with access to confidential data.
  • This sample data sharing agreement contains a sample template for data sharing and use and disclosure of client information. Within the data sharing agreement there are important areas to consider for inclusion. At a minimum, the agreement should specify the following:
    • Parties involved, including contact information
    • Purpose or need for the data sharing agreement
    • Nature of the data to be collected
    • Access and confidentiality of data
    • How the data are to be used
    • How and in what situations the agreement can be severed by either party
    • Relevant legal authorities (e.g., tribal, state, national)
  • This confidentiality pledge outlines the rules for internal access to a data set containing direct personal identifiers, such as a patient registration list or tribal enrollment list, which may be used for record linkages. Technical details of data exchange between multiple parties should be detailed separately in a data sharing/use/exchange agreement.
  • Link Methods Protocol is an example of an IRB protocol describing linkage methods using Link Plus Software
 
For more information about this toolkit, please contact Annie Tran. Click here to view other tribal epidemiology activities.
 
 
Association Management Software Powered by YourMembership.com®  ::  Legal