CHAPTER IV:
RECOMMENDED DATA EDITS AND
SOFTWARE COORDINATION OF STANDARDS
Definitions
“Data edits” refer to computer software algorithms that check the content of data fields against an encoded set of acceptable codes and subsequently provide feedback on the quality of the data. Data edits verify that only acceptable values are used for codes and, more importantly, enforce correct relationships between the codes recorded for related data items. Data edits can apply pass/fail criteria to data, so that a particular code or group of codes is determined to be either correct or incorrect. Identified errors are corrected, and edits are re-run to ensure that the error was appropriately resolved. Certain types of edits identify coding combinations that are so rare or unlikely that they are most likely errors. Cases containing errors identified by these edits need to be manually reviewed, and if documentation is found to confirm that the case is rare or unusual and was originally correctly coded, an “over-ride flag” is set for the edit; i.e., a ‘1’ (2 or 3) is entered into the over-ride flag field associated with the edit. Setting the over-ride flag will prevent the case from generating an error when it is re-run through the edit. Over-ride flags should not be set unless the entire case has been reviewed and documentation is found to confirm that the case is rare or unusual, as the majority of errors identified by such edits are in fact coding errors that end up being corrected, not over-ridden.
Generally, there are three types of edits:
- Single-field edits or item edits are those that verify only one data item at a time. For example, an edit of the item “Sex” would verify that only valid values are used in the field.
- Inter-field edits or multi-field edits are those edits that compare the codes recorded for one data item with codes recorded for related data items. For example, a common inter-field edit compares the code for “Sex” with the code for “Primary Site” and identifies female prostate cancer as an error.
- Inter-record edits or multi-record edits compare data recorded across more than one record, and are commonly applied across tumor records for a patient that has multiple tumors. These edits compare codes or groups of codes recorded in the same data item(s) between each of the tumor records for the patient. For example, one inter-record edit compares the sequence numbers of multiple tumors for the same patient with their dates of diagnosis to ensure that the sequence numbers have been assigned in the correct chronological order based on diagnosis date.
Challenges
There are at least six challenges to the standardization of data edits across central and hospital-based cancer registries. These include:
- Registry systems that encode an edit from standard specifications may be written in different computer languages, with possible differences in detail due to differing translations
- Each implementation of an agreed-upon standard specification may be programmed differently, despite intent to encode a standard meaning
- Complete edits are not always performed at the time of data entry
- Documentation of the edit algorithms can be difficult for data analysts and collectors to obtain, and may not be in a user-friendly format
- Consolidated data collected from different reporting sources and via different data entry tools may encourage the equating of “apples” and “oranges” without the users' knowledge
- When standards change, synchronized implementation of associated revised edits is difficult, due to the differing release schedules of cancer software providers and their limited ability to rapidly respond to changes at a given time
Uniform, standardized edits must be applied to all cancer registry data in order to generate data that are comparable across registries.
The EDITS Software
The EDITS Software Project began with an informal discussion about promoting and supporting data processing standards after a 1990 meeting of the NAACCR Data Evaluation and Publication Committee. A small group of registry operators, software producers, and data consumers identified the missing element of standard setting at that time: an executable version of a standard that could be applied directly to data in a variety of processing scenarios without reinterpretation by programmers. At that time, producers of cancer registry software wishing to adhere to a published standard had to write their own computer code to implement any edit-checking algorithms. The solution would need to be flexible in many dimensions to accommodate the many technical, operational, scientific, economic, and agency-related considerations that make up the cancer registry milieu.
EDITS is a set of software tools that can be used to improve data quality and standardize the way data items are checked for validity. The EDITS tools have been developed by CDC and NPCR and currently include three applications: EditWriter, GenEDITS Plus, and the Application Program Interface. These tools can be built into interactive data collection systems to achieve real-time field-by-field editing during data entry. They can also be used in batch-editing processes for data already collected. EDITS provides software to support three types of data activities: defining standards for data quality, standardizing data collection processes, and analyzing data quality. The EDITS tools were recently modernized and converted to the Windows operating environment, resulting in significantly improved function, efficiency, and user-friendliness of the software.
EDITS can be used to apply single-field and inter-field type edits routinely and interactively to cancer registry data, and is used extensively at all levels of cancer reporting: facilities, central registries, standard setters, and cancer software vendors.
EditWriter
The EDITS API is used to incorporate EDITS into cancer data abstracting, reporting, or processing software. The EDITS API can be incorporated into programs of many descriptions, including programs for interactive data entry and after-the-fact verification of data. Any language product for Windows should be able to use the EDITS API. The EDITS API is distributed as a Windows Dynamic Link Library and as C source code, and is used by most cancer software vendors.
EDITS Application Program Interface (API)
GenEDITS PLUS
GenEDITS Plus is used to apply data quality edits to data files using the metafiles produced within EditWriter, and to generate error reports for error resolution. GenEDITS Plus is the fastest way to apply standard edits to data and obtain a report of data errors. GenEDITS Plus accepts NAACCR-formatted files, and produces two reports, both a detail report containing record-level error information, and a summary report containing error summary statistics. Because GenEDITS Plus already incorporates the EDITS API, no programming is required.
The EDITS Language
The algorithms that check data are specified using the EDITS language, a simplified programming language designed to validate data. The language includes a collection of powerful and specialized built-in functions that often reduce the complete validation of a data item to a single program statement. When complicated data relationships exist within a record, the EDITS language can express a complex validation scheme, including multiple fields, multiple table lookups, nested control statements, and functions.
The EDITS Metafiles
EDITS Metafiles (.smf files) contain everything needed to edit a data file, except the data. Metafiles provide portability of edits, in that the same edits can be applied to different data formats for different purposes. EDITS Metafiles are created and modified using EditWriter. The key components of an EDITS Metafile include: agencies, data dictionary, record layouts, edits, edit sets, error messages, and user look-up tables.
For additional information about EDITS or to download the EDITS software, see CDC's Division of Cancer Prevention and Control Website at: http://www.cdc.gov/cancer/npcr/.
NAACCR Standard Edits and the NAACCR Metafile
NAACCR has made increased standardization of data edits a priority, facilitated by the EDITS software, which provides a mechanism for standardized, transportable, and updateable edits to be provided through a “public library.” The goals are to help limit the proliferation of differing standards when there is no compelling need to be different, and to provide comprehensive public documentation in a current and readily accessible form in those instances where standards must differ.
The NAACCR Metafile is a comprehensive database of cancer registry standards and consists of a collection of tables that contain all the information needed to test data fields for validity and acceptability. The NAACCR Metafile specifically includes the following: standard setter list; current data dictionary of standard fields; sets of fields defining standard records; executable single- and multi-field validation logic; text descriptions of edits; look-up tables; and error messages.
NAACCR first made standard edits available in 1996. These edits corresponded to NAACCR's 1995 record layout and data dictionary, and were documented as Volume IV in its Standards series.28 Since that time, NAACCR has posted EDITS metafiles containing standard edits on the Internet that correspond to the annual NAACCR record layouts and data dictionaries. For example, “Revised Version 11 Metafile--NAACCR 11.1A” refers to the standard edits in the NAACCR Version 11.1 record layout. The “A” notation indicates the first revision to the Version 11.1 record layout standard edits. The hardcopy of Volume IV has been discontinued in favor of electronic publication of EDITS documentation using EditWriter. The EDITS Software, along with general instructions, and various current and previous metafiles containing the most recent and historical public standards for cancer registry data, are available on the NAACCR website at at https://www.naaccr.org/standard-data-edits/.
NPCR Inter-Record Edits Utility
Mature central cancer registries can have up to 15-20% multiple primary data. In order to validate coded values across multiple tumor records for a single patient, inter-record edits must be applied to the data. In the early 1980s SEER developed an Inter-record Edits program for SEER registries. In 2000, NPCR began development of an Inter-record Edits Utility for use by NPCR registries; this software included similar logic, but had run-time differences from the SEER Inter-record Edits program. In 2003, NAACCR began using the NPCR Inter-record Edits Utility in their annual Calls for Data.
The NPCR Inter-record Edits Utility accepts NAACCR-formatted files, produces two reports, both a detail report and a summary report, and currently contains 22 edits. NPCR Inter-record Edits are applied to consolidated tumor data, i.e., files containing one record per tumor per patient. Identified inter-record errors are corrected, and the inter-record edits are re-run to ensure that the error was appropriately resolved.
SEER*Edits
For many years, the SEER Program has maintained a library of standardized edits which it applies to data submissions from the participating SEER registries. Over the years as experience and expertise increased, SEER has fine-tuned and expanded the edits and has made these edits available to SEER and other registries. In addition, the logic of the SEER edits has been used as the foundation for the EDITS project where SEER is the source of standard for the item or items.
Over time, as more and more computer processing moved away from the mainframe platform, the SEER Program re-programmed their edits in C++ (SEER*Edits). This change has allowed the SEER edit engine to be ported to and compiled on a variety of hardware platforms. The edit engine includes the entire SEER field, inter-field, and inter-record edits. SEER*Edits can be used as a stand-alone package for the SEER areas to use before submission of data to SEER, or the edits can be incorporated individually by SEER registries for use in their data entry programs or routine editing of data. Data files used as input into the stand-alone version of SEER*Edits must be stored in NAACCR format. The SEER*Edits package also includes report-generating functions including the display of errors to facilitate data corrections. Various follow-up, surveillance, and SEER registry requirement reports are also included. Any changes made to the SEER*Edits package also are made to both the SEER Data Management System (SEER*DMS) and to the corresponding edits in the NAACCR Metafile for the EDITS project and vice versa to keep them synchronized.