Skip to main content

Data File Sets

About Data File Sets:

A data file set is a collection of logically related files and is the term used to refer to collections of any type of CEDR data files. For the worker health and mortality studies, however, CEDR files are grouped into working data file sets and analytic data file sets.

Analytic Data File Sets

An analytic data file set contains files of data called analytic files on which the analyses or results of a study are based. Researchers select data from a number of potentially diverse working files to form their initial analytic data file set. Analytic files contain data upon which a researcher more directly bases a study's conclusions or reported findings. Typically, analytic files are composites of data (derived from working files) formulated to meet specific cohort descriptions and study needs that represent the best data available at that time. To create analytic files, researchers may subset data to form a cohort, merge data from multiple working files, validate and edit the working data, and convert the data as necessary to conduct analyses. Analytic files generally represent data that have been extensively reviewed, validated, and verified by a researcher.

Analytic data files may include variables generated from, but not identical to, one or more variables found in the working data files. Also, depending on the purpose of the analysis, the analytic data file may contain additional variables not found in the working files, that were collected, validated, or created by the individual investigators who were analyzing the data.

Analytic files are static and are retained permanently in CEDR. If similar files with updated data are used for additional analyses, a new data file set is defined and stored in addition to the original analytic data file set.

Working Data File Sets

A working data file set contains files of data called working files that were collected and updated during decades of follow-up of worker cohorts.

Working files may contain an assortment of data, including demographic, work history, industrial hygiene, vital status, internal dosimetry, and external dosimetry data, gathered from a variety of sources. Working files are more dynamic, with new, updated, and corrected information being added to the files. The levels of verification and validation applied to working data will vary by site and by study. For some studies, working files have been periodically updated in CEDR; older versions of these working files have been archived.

The working data file sets mainly include the data collected and updated by the three epidemiologic research centers. A researcher selects or generates variables from these more dynamic data files to form analytic data files.

Individuals using these data file sets should note that the working data file sets may include variables not carried forward to the analytic data file or associated studies. There may be data entry and other errors in these files that have never been discovered or resolved; therefore, users should take appropriate precautions when working with these data.

How Data Are Collected:

As researchers near completion of their study, they compile their electronic files, including working files, analytic files, and structured documentation, into data file sets that are submitted to CEDR. Because the information used by various researchers in the past was originally generated from different DOE facilities or other sources for different purposes and needs, there is considerable variation in the format and content of data files. The accompanying structured documentation, however, is processed into a standard format. This standard format ensures that the structured documentation explains the data as consistently and completely as possible.

Quality assurance procedures used in CEDR processing ensure the data, as presented in CEDR, are consistent with that submitted by the data provider. When CEDR data are received, the data files are examined and the structured documentation is reviewed for completeness. Data are not accepted until structured documentation requirements are satisfied. After acceptance, the data file sets and structured documentation are loaded into the CEDR system. The data and structured documentation are reviewed by the researcher who provided the data for final verification and approval.

Most DOE epidemiologic studies examined the mortality rate of the study population and ascertained the cause of death of deceased workers using information obtained from death certificates. Each State has its own requirements and rules for releasing death certificate information. Users should note that, in spite of repeated requests by DOE, the New York City Department of Health will not permit release of cause-of-death information for deaths occurring in New York City.

Data Privacy:

CEDR data are protected and are fully compliant with the Federal Information Security Management Act (FISMA) requirements for security certification and accreditation; i.e., all of its networks maintaining Federal data are certified to be secure and rigorously protected.

CEDR is password protected and a password is required to access the CEDR data. Although personal identifiers (such as name, date of birth, and specific identification that could be used to identify a worker) have been removed from the data, a promise of confidentiality is still required.

Confidentiality Protection:

CEDR's goal is to protect the privacy of individuals while maximizing the usefulness of the data.

CEDR's release policy de-identifies the data for use by interested individuals.

CEDR complies with requirements of the Privacy Act of 1974 and the agreements between DOE and State health departments. Therefore, to protect the identities of individual workers, the following data release policy was implemented:

  1. Individuals are only identified by a number. Other identifying information, such as names and social security numbers has been omitted.
  2. The specification of race is limited to white, black, and other.
  3. The dates of birth and death are truncated to year only.
  4. Other personal dates, such as date of employment, are truncated to month and year.

Next: Site Information

Share This Page:
To Top

The Oak Ridge Institute for Science and Education (ORISE) is managed by Oak Ridge Associated Universities (ORAU) for the U.S. Department of Energy (DOE).
Files are built in .CSV and .XLSX formats for use in Excel. Bibliographies are in Adobe Reader .PDF format - Privacy/Security Notice