SDTM
SDTM defines a standard structure for human clinical trial data tabulations and for nonclinical study data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration. The Submission Data Standards team of Clinical Data Interchange Standards Consortium defines SDTM.
On July 21, 2004, SDTM was selected as the standard specification for submitting tabulation data to the FDA for clinical trials and on July 5, 2011 for nonclinical studies. Eventually, all data submissions will be expected to conform to this format. As a result, clinical and nonclinical Data Managers will need to become proficient in the SDTM to prepare submissions and apply the SDTM structures, where appropriate, for operational data management.
Background
SDTM is built around the concept of observations collected about subjects who participated in a clinical study. Each observation can be described by a series of variables, corresponding to a row in a dataset or table. Each variable can be classified according to its Role. A Role determines the type of information conveyed by the variable about each distinct observation and how it can be used. Variables can be classified into four major roles:- Identifier variables, which identify the study, subject of the observation, the domain, and the sequence number of the record
- Topic variables, which specify the focus of the observation
- Timing variables, which describe the timing of the observation
- Qualifier variables, which include additional illustrative text, or numeric values that describe the results or additional traits of the observation.
to define start, end, or looping conditions in the Trial Design model.
The set of Qualifier variables can be further categorized into five sub-classes:
- Grouping Qualifiers are used to group together a collection of observations within the same domain. Examples include --CAT and --SCAT.
- Result Qualifiers describe the specific results associated with the topic variable for a finding. It is the answer to the question raised by the topic variable. Examples include --ORRES, --STRESC, and --STRESN. Many of the values in the DM domain are also classified as Result Qualifiers.
- Synonym Qualifiers specify an alternative name for a particular variable in an observation. Examples include --MODIFY and --DECOD, which are equivalent terms for a --TRT or --TERM topic variable, --TEST and --LOINC which are equivalent terms for a --TESTCD.
- Record Qualifiers define additional attributes of the observation record as a whole. Examples include --REASND, AESLIFE, and all other SAE flag variables in the AE domain; and --BLFL, --POS and --LOC, --SPEC, --LOT, --NAM.
- Variable Qualifiers are used to further modify or describe a specific variable within an observation and is only meaningful in the context of the variable they qualify. Examples include --ORRESU, --ORNRHI, and --ORNRLO, all of which are variable qualifiers of --ORRES, and --DOSU and --DOSFRM, all of which are variable qualifiers of --DOSE.
Additional Timing and Qualifier variables could be included to provide the necessary detail to adequately describe an observation.• The SDTM addition to PROC CDISC does not convert existing SDS 2.x content to SDTM 3.x representations.
Datasets and domains
Observations are normally collected for all subjects in a series of domains. A domain is defined as a collection of logically-related observations with a topic-specific commonality about the subjects in the trial. The logic of the relationship may relate to the scientific matter of the data, or to its role in the trial.Typically, each domain is represented by a dataset, but it is possible to have information relevant to the same topicality spread among multiple datasets. Each dataset is distinguished by a unique, two-character DOMAIN code that should be used consistently throughout the submission. This DOMAIN code is used in the dataset name, the value of the DOMAIN variable within that dataset, and as a prefix for most variable names in the dataset.
The dataset structure for observations is a flat file representing a table with one or more rows and columns. Normally, one dataset is submitted for each domain. Each row of the dataset represents a single observation and each column represents one of the variables. Each dataset or table is accompanied by metadata definitions that provide information about the variables used in the dataset. The metadata are described in a data definition document named 'Define' that is submitted along with the data to regulatory authorities.
Submission Metadata Model uses seven distinct metadata attributes to be defined for each dataset variable in the metadata definition document:
- The Variable Name
- A descriptive Variable Label, using up to 40 characters, which should be unique for each variable in the dataset
- The data Type
- The set of controlled terminology for the value or the presentation format of the variable
- The Origin or source of each variable
- The Role of the variable, which determines how the variable is used in the dataset. Roles are used to represent the categories of variables as Identifier, Topic, Timing, or the five types of Qualifiers. Since these roles are predefined for all domains that follow the general classes, they do not need to be specified by sponsors in their Define data definition document. Actual submission metadata may use additional role designations, and more than one role may be assigned per variable to meet different needs.
- Comments or other relevant information about the variable or its data.
Comments are included as necessary according to the needs of individual studies.
The presence of an asterisk in the 'Controlled Terms or Format' column indicates that a discrete set of values is expected to be made available for this variable. This set of values may be sponsor-defined in cases where standard vocabularies have not yet been defined or from an external published source such as MedDRA.
'''
Special-purpose domains
The CDISC Version 3.x Submission Data Domain Models include special-purpose domains with a specificstructure and cannot be extended with any additional qualifier or timing variables other than those specified.
- Demographics includes a set of standard variables that describe each subject in a clinical study
- Comments describes a fixed structure for recording free-text comments on a subject, or comments related to records or groups of records in other domains.
The general domain classes
Most observations collected during the study should be divided among three general observation classes: Interventions, Events, or Findings:- The Interventions class captures investigational treatments, therapeutic treatments, and surgical procedures that are intentionally administered to the subject either as specified by the study protocol, coincident with the study assessment period, or other substances self-administered by the subject
- The Events class captures occurrences or incidents independent of planned study evaluations occurring during the trial or prior to the trial.
- The Findings class captures the observations resulting from planned evaluations to address specific questions such as observations made during a physical examination, laboratory tests, ECG testing, and sets of individual questions listed on questionnaires.
All datasets based on any of the general observation classes share a set of common Identifier variables and Timing variables. Three general rules apply when determining which
variables to include in a domain:
- The same set of Identifier variables applies to all domains based on the general observation classes. An optional identifier can be used wherever appropriate.
- Any valid Timing variable is permissible for use in any submission dataset, but it should be used consistently where applicable for all domains.
- Any additional Qualifier variables from the same general class may be added to a domain model.
The CDISC standard domain models (SDTMIG 3.2)
- Comments
- Demographics
- Subject Elements
- Subject Visits
- Concomitant Medications
- Exposure as Collected
- Exposure
- Substance Use
- Procedures
- Adverse Events
- Clinical Events
- Disposition
- Protocol Deviations
- Medical History
- Healthcare Encounters
- Drug Accountability
- Death Details
- ECG Test Results
- Inclusion/Exclusion Criterion Not Met
- Immunogenicity Specimen Assessments
- Laboratory Test Results
- Microbiology Specimen
- Microscopic Findings
- Morphology
- Microbiology Susceptibility Test
- PK Concentrations
- PK Parameters
- Physical Examination
- Questionnaires
- Reproductive System Findings
- Disease Response
- Subject Characteristics
- Subject Status
- Tumor Identification
- Tumor Results
- Vital Signs
- Findings About Events or Interventions
- Skin Response
- Trial Arms
- Trial Disease Assessment
- Trial Elements
- Trial Visits
- Trial Inclusion/Exclusion Criteria
- Trial Summary
- Supplemental Qualifiers - SUPPQUAL
- Relate Records - RELREC
Limitations and criticism of standards