Over time there have been inconsistencies in coding standards required by major standard-setting organizations concerning the item sets required, the codes and coding instructions employed, and the timing of adoption of new or revised codes that affect the use of data compiled over several years and from multiple sources. These issues are described below. The standards for tumor inclusion, reportability, and multiple primary rules are addressed separately in Chapter III.
The Uniform Data Standards Work Group (UDSWG) will continue to seek consensus on unresolved issues. Before new standards can be agreed upon, all interested parties must be provided sufficient time to study the proposals. Once UDSWG approves new standards, there must be adequate time for implementation. All members are encouraged to present suggestions or comments on proposed changes to the standards to UDSWG. The NAACCR website, http://www.naaccr.org, provides the name of the Committee Chair and electronic forms for proposing additions or revisions.
This chapter describes coding issues affecting each of the following types of measures:
- occupation and industry
- sequence numbers
- staging descriptors
- treatment descriptors
- timing of first course treatment
- vital status codes
The descriptions in this chapter are intended to provide a summary of coding issues. The original manuals should be consulted when a particular data use requires more detail. This chapter does not track changes made in individual codes over time. Some changes are noted in the individual item dictionary descriptions, and further information can be obtained from historic versions of this volume and from the individual standard setters associated with the items.
County--Current  and County at DX 
NAACCR has adopted the Federal Information Processing Standards (FIPS) codes for county as the standard in this volume (see Appendix A for codes). However, standards for codes used vary somewhat by standard setter. For cancers diagnosed prior to 2002, the use of FIPS codes was not universally adopted. For this reason, users of data should determine which codes were used for coding County at DX in a particular file, since no field indicating “County at DX Coding System” is included in the NAACCR layout.
- The SEER Program requires the use of FIPS codes for counties in the United States, plus the special code 999 (unknown).
- CoC requires the use of FIPS county codes as their standard, plus the special codes 998 and 999. However, the STORE manual also provides for use of geocodes for countries of residence outside the United States and Canada to be used in this field.
- NPCR requires the use of FIPS codes for counties in the United States, plus the special code 999, starting with cancers diagnosed on or after January 1, 2002.
Spanish/Hispanic Origin (Hispanic Ethnicity) [190-210]
Although agreement on standard codes for the data item “Spanish/Hispanic Origin ” has been reached, substantial variation persists among registries in how Hispanic ethnicity or Spanish/Hispanic Origin is determined. Procedures for determining ethnicity include:
- Recording ethnicity from information found in the medical record.
- Recording ethnicity based on a combination of patient demographic information that may include last name, maiden name, birthplace, or a statement of ethnicity in the record.
- Recording ethnicity based on a manual or computer matching of a documented surname, either last name or maiden name, against one or more listings of Spanish surnames. Common Spanish surname listings include: the 1980 and 1990 Census Bureau lists, the University of New Mexico GUESS list, and regional listings of Spanish surnames common to a particular geographic region (for example, the Florida list).
- Recording the ethnicity based on the application of a computer algorithm to available data items that may include last name, maiden name, birthplace, race, or sex to assign ethnicity.
Population-based registries should attempt to categorize their cases using a method that best approximates the method used by the Census Bureau to determine ethnicity in the population denominators. A standard best method has not been determined. At this time, collection of ethnicity data is not a standard applied by the Canadian Cancer Registry or the provincial/territorial registries.
Attempts have been made to evaluate and improve numerator data based on various methodologic approaches to determining Spanish/Hispanic Origin. NAACCR sponsored a symposium in Atlanta, GA, in January 1996 to discuss methodologic issues faced when attempting to measure cancer among Hispanics. A report was prepared and is available on the NAACCR website (http://www.naaccr.org) under the heading “Epidemiologic Reports.” In 1999, a research group was formed from representatives of NAACCR to address issues of definition and to produce comparable data for Hispanic ethnicities across the United States. The group, operating under the auspices of the NAACCR Data Evaluation and Publications Committee, led to the creation of the NAACCR Hispanic Identification Algorithm (NHIA), an algorithm that uses a combination of NAACCR variables to directly or indirectly assign ethnicity.
Registries continue to use different methods to code Hispanic ethnicity. Users of the data must be able to determine how Hispanic ethnicity coding was assigned in a particular file. Based on historical and current discussions, NAACCR includes the field Spanish/Hispanic Origin  for direct recording of ethnicity from the medical record, as well as fields for Computed Ethnicity , Computed Ethnicity Source , and NHIA Derived Hispanic Origin .
Occupation and Industry [270-330]
Most population-based registries have found the collection of usual occupation and industry data to be difficult and of limited utility, and for many years no consensus on data items and codes for occupation and industry had been achieved. In 1992, the Cancer Registries Amendment Act required central registries funded by NPCR to collect occupation or industry data to the extent available in the medical record.33
Data on usual occupation and industry are unavailable in an unknown, but significant, proportion of medical records. Even when available, the quality of the data in the medical record is generally untested and often limited to less useful information such as “retired.” Concurrently, this information generally is available in text format on death certificates and, in some states, on the associated state mortality data files.
Some state mortality data files also contain the associated occupation and industry codes in addition to the text data. Much work remains to be done to improve the availability and capture of this potentially important information.
NAACCR will continue to discuss the quality and completeness of occupation and industry data and will reconsider the inclusion of occupation and industry in its recommended data sets.
Sequence Number [380 and 560]
As discussed in Chapter III, SEER, NPCR, and CoC have different standards for determining tumors that are reportable and are to be included in the registry. In addition to collecting these required tumors, some registries also collect and assign sequence numbers to other tumors such as cervix carcinoma in situ or PIN III.
Two sequence number data items, one assigned by the reporting facility, Sequence Number--Hospital , and one assigned by the central registry, Sequence Number--Central , are now in use. The time period of both Sequence Number data items is a person’s lifetime, although with earlier definitions of Sequence Number--Central , central registries historically assigned the numbers from the reference date of the registry. When reportability of a particular tumor changes over time, both the type and the timing of tumors may affect the assignment of sequence numbers, so it is possible for two patients having similar cancer histories to be characterized by different sets of sequence numbers.
Numerous operational issues, such as storage of multiple facility-specific sequence numbers, appropriate linkage rules, and feedback of data to hospitals, have arisen because of policy differences from state to state. When attempting to use the Sequence Number--Central to identify individuals who have had only one lifetime cancer, it is important to realize the definitions used to make that determination vary and that sequencing may be handled differently in different systems.
AJCC TNM Stage, SEER EOD, SEER Historic Stage, SEER Summary Stage (1977, 2000 and 2018), and Collaborative Staging
Historically, four major staging schemes have been widely used in cancer registries in the United States. The schemes--AJCC TNM, SEER Extent of Disease, SEER Historic Stage, and SEER Summary Stage--differ in complexity, purpose, structure, rules, and definitions. AJCC TNM staging provides clinical utility. SEER EOD provides longitudinal stability for epidemiological studies. SEER Historic and SEER Summary Stage provide population surveillance staging capability. Several oncology subspecialties have developed staging systems applying to a limited number of cancer sites.
In January 2004, the Collaborative Staging System was introduced to reduce duplication of effort and provide a common staging schema from which the major staging categories could be electronically derived. All standard setters in the United States required the use of the Collaborative Staging System version 1 for cases diagnosed January 1, 2004- December 31, 2009, but not every standard setter required every data element. In Canada, the Collaborative Staging System version 1 was adopted as the first national, stage data collection standard for cases diagnosed January 1, 2004; as in the U.S., not all data elements are required. Provincial/Territorial registries are gradually moving forward since 2004 to collect stage data on all newly diagnosed cases but have not yet achieved that goal. CS version 2, based on AJCC 7th edition and renamed the Collaborative Stage (CS) Data Collection System, is effective for cases diagnosed January 1, 2010, through December 31, 2015.
The historic schemes were designed for different purposes at different times, and are not easily compared. Conversion among the seven editions of the AJCC TNM Cancer Staging Manual is often not possible. Minor differences exist between the SEER Summary Staging guides of 1977 and 2000. SEER published the Comparative Staging Guide for Cancer6 in 1993 as an attempt to present comprehensive, site-specific comparisons of the AJCC TNM, SEER EOD, and SEER Summary Staging schemes as an aid in data collection and interpretation. This guide covered the major cancer sites of colon and rectum, lung and bronchus, breast, female genital, prostate gland, and urinary bladder. According to the guide:
- Changes over time in methods of cancer screening, diagnosis, staging, and treatment have affected the distribution of disease stage.
- Changes over time in the classification schemes themselves can complicate data analysis and obscure the meaning of time trends. Various other staging schemes also are in use. Several oncology subspecialties have developed staging systems applying to a limited number of cancer sites.
For these reasons, comparing cancer registry data by stage over time or across registries, or using pooled data collected by different registries applying different staging schema, is problematic.6
For a discussion of staging issues that affect rules for case inclusion and reportability, see Chapter III, especially the paragraphs “In Situ/Invasive” and “Multiple Primary Rules.”
A summary of the major staging/coding schemes is provided below.
- The American Joint Committee on Cancer's TNM System (AJCC TNM)
The AJCC Cancer Staging Manual presents an anatomically oriented, site-specific staging system that consists of separate categories for the tumor, nodes, and metastases. The TNM categories then are grouped by stage, from 0 to IV. Beginning with the 7th edition, select prognostic factors are also published.
- SEER Extent of Disease (SEER EOD)
This site-specific 10-digit coding scheme8 was required for SEER registries until December 31, 2003. Other state and central registries also used it. EOD was designed to allow collapse of the codes into the stage groupings of several different staging systems, including AJCC stage group.
- SEER Summary Stage
SEER Summary Stage 1977  was a site-specific single-digit coding scheme required for NPCR registries until December 31, 2001, and it was either converted from EOD or directly coded by SEER registries for the years it was required for NAACCR data submission. In addition, until Collaborative Stage was implemented, CoC required the coding of SEER Summary Stage when a corresponding AJCC TNM site code scheme was not available. Cancers diagnosed before January 1, 2001, were assigned a summary stage according to Summary Stage Guide, Cancer Surveillance Epidemiology and End Results Reporting, SEER Program, April 1977,11 and the code was reported in the SEER Summary Stage 1977 data item . Cancers diagnosed on or after January 1, 2001, were assigned a summary stage according to the SEER Summary Staging Manual, 2000,12 and the code should be reported in the SEER Summary Stage 2000  data item. (See NAACCR Guidelines for Implementation of SEER Summary Stage 2000.) Beginning for cases diagnosed 2004+, both SEER Summary Stages are derived from variables using the CS algorithm: Derived SS1977  for SEER Summary Stage 1977 and Derived SS2000  for SEER Summary Stage 2000.
- SEER Historic Stage
Sometimes SEER’s published stage data use the categories developed by an earlier program, the End Results Group. The Historic Stage variable has been defined consistently over time to facilitate long-term trend analyses, and the categories are not identical to those in the SEER Summary Stage.
- Collaborative Stage
The Collaborative Stage (CS) data set is a combination of data items (most of which have traditionally been collected as a part of regular cancer surveillance activities) that include tumor size, extension, lymph node status, metastatic status, evaluation fields describing the hierarchy of the data collected, and relevant site-specific information. This unified data set includes an algorithm which derives four different staging systems from the data collected and resolves subtle staging rule differences. The systems for which staging currently can be derived include AJCC TNM 6th Edition, AJCC TNM 7th Edition, SEER Summary Stage 1977, and SEER Summary Stage 2000. CSv2 incorporates the AJCC 7th edition prognostic factors as Site Specific Factors. Not all standard setters require all CS elements to be collected.
Tumor Size Rules 
Over the years, some of the rules for describing tumor size changed several times, and discrepancies existed between the CoC and SEER data. With the implementation of the CS coding system in 2004, all the differences between the two groups’ guidelines for tumor size have now been resolved.
The sites for which the tumor size guidelines differed are listed below. Users of registry data must be aware of possible discrepancies in the meaning of the information recorded in this variable before the diagnosis years indicated in parenthesis.
- Melanomas (2002)
- Microscopic foci (2003)
- Most lesions smaller than 2 millimeters (2004)
- Breast and Lung lesions smaller than 3 millimeters (2004)
- Mycosis fungoides, Sezary disease, lymphomas, Kaposi sarcoma (2004)
Historically, NPCR has recommended collecting the date and type of first course of definitive treatment when available.29 For the 1996-1997 diagnosis years, NPCR-funded registries were required to collect and process available treatment information using either the (1995 or 1996) SEER Program treatment data set or the (1995 or 1996) CoC treatment data set.
For 1998-2000, NPCR had a similar recommendation. NPCR-funded registries adopted either the SEER 1998 or the CoC 1998 treatment data set, and were encouraged to use the data item “RX Coding System--Current”  to indicate how treatment was coded for a specific record.
Beginning with 2003 diagnoses, the CoC FORDS2 redefined some treatment fields and added others. Some new and redefined data fields along with dates of treatment are required by NPCR. For the 2003 and forward diagnosis years, NPCR requires the collection of first course of treatment data items when available and requires the submission of the NPCR-required surgery data items. NPCR uses the same codes as CoC FORDS, but does not collect all the data fields. See the list of data items (Chapter VIII) that NPCR registries collect.
SEER will use the same codes as the CoC FORDS but may not collect all of the fields. For example, SEER areas will not collect Rad--Treatment Volume. See the list of data items (Chapter VIII) that SEER areas collect and that SEER requires the SEER registries to transmit to NCI. SEER areas will use the fields Rad--Regional RX Modality  and Rad--Boost Rx Modality (3200) from CoC hospitals to complete RX Summ--Radiation .
RX Summ--Rad to CNS 
This item is maintained in the transmission file for use with historic data. CoC discontinued collection of the item for cases diagnosed on or after January 1, 1996, and SEER discontinued collecting it for tumors diagnosed beginning in 1998. Both organizations instructed coders to record radiation to the central nervous system following those dates as radiation. SEER retains the codes for earlier cases and also converts the data into an appropriate radiation field. The item is no longer supported in any form by CoC.
Time Period for First Course of Treatment [1260, 1270, 1500]
SEER and CoC have historically defined first course treatment differently. The differences affect representation of the date first course treatment begins and the instructions for determining what constitutes first course treatment.
The NAACCR record layout provides two data items that indicate the date of the start of the first course of treatment: Date 1st CRS RX CoC  as defined by CoC, and Date Initial RX SEER  as defined by SEER. The difference between these two definitions is that CoC defines the date the physician decides not to treat the patient as the date of initial treatment, while SEER considers such a decision to be no treatment and the date field is left blank, and the corresponding date flag value is ‘11’.
The SEER and CoC definitions of treatment to be included as “first course” have become increasingly congruent, differing now primarily in their “fall-back” recommendations that apply when no treatment plan is recorded, no standard facility practice applies, no protocol applies, no physician is able to provide assistance, and no record of treatment failure or recurrence of disease is available. In that extreme instance, CoC recommends a 4-month cutoff for the beginning of first-course treatment, and SEER applies a 1-year cutoff for completion of first course of therapy.
Users of historical treatment data should be aware that the definitions of “first course” have changed over time and have been disjointed in the past. The applicable coding manuals and standard-setting organizations should be consulted for specifics.
Users of treatment data also should be aware that registries differ in the amount of treatment data collected in terms of the types of treatment included, non-hospital treatment locations surveyed, items covered (see the previous section), and the use of all codes provided for each item. Thus, treatment data are likely to be inconsistent among registries and to have varying levels of completeness, especially for treatment given in physicians’ offices or other non-hospital settings.
The NAACCR data standards adopted thus far do not adequately deal with data from places outside the United States. Changes have been made to accommodate postal codes, standard abbreviations for provinces/territories, and other fields in the Canadian data set. A CCCR column has been added to the Required Status Table and future versions of this document will review and increasingly incorporate standards established for Canadian cancer registries.