The initial release of {sdtm.oak} provides a framework for modular programming of SDTM in R and sets the stage for potential automation of SDTM creation following the standardized SDTM specification. In the future, the automation workflow could involve preparing specifications and then making automated function calls to generate SDTM domains.
The future workflow for automation could look like:
- Prepare SDTM specification: Users can define the raw data source, target SDTM domain, target SDTM variables, and algorithms used for automation. A template is still under development; details are also provided in this article.
- Prepare SDTM-controlled Terminology: Users can define the SDTM-controlled terms applicable to the study. A template is still under development.
- An automated process to read the specification and make {sdtm.oak} function calls can create the code required to generate SDTM datasets or the datasets themselves.
This article provides an overview of metadata and a draft version of the standard SDTM specification. We plan to demonstrate the creation of standard SDTM specs from the CDISC library in collaboration with CDISC COSA. Sponsors may need to establish the necessary tools to generate this SDTM specification from their MDR to utilize the automation features of {sdtm.oak}. It’s worth mentioning that this concept draws inspiration from Roche’s existing implementation of the SDTM automation process using OAK. I would like to inform you that further development is required for this concept.
Throughout this article, the term “metadata” is used several times. In this context, “metadata” refers to the specific metadata used by {sdtm.oak}. This article aims to provide users with a more detailed understanding of the {sdtm.oak} metadata.
In general, metadata can be defined as “data about data.” It does not include any patient-level data. Instead, the metadata provides a blueprint of the data that needs to be collected during a study.
Standards Metadata
The standards metadata used in {sdtm.oak} is sourced from the CDISC Library or sponsor MDR or any other form of documentation where standards are maintained. This metadata provides information on the following:
- The relationship between Data Collection Standards (eCRF & eDT), SDTM mapping, and Controlled Terminology
- Machine-readable standard SDTM mappings
- Algorithms and associated metadata required for the SDTM automation of standards in the study.
In the upcoming releases of {sdtm.oak}, we will effectively utilize the standards metadata and customize it to meet the study requirements.
Study Definition Metadata
Study Definition Metadata is also referred to as Study Metadata. Study Definition Metadata provides information about the eCRF and eDT data collected in the study.
eCRF Metadata The eCRF Design Metadata is fetched from the EDC system. This Metadata includes
Forms Metadata: Identifier, eCRF label, Repeating format and other properties of the eCRF.
Fields Metadata: Identifier, question label, datatype, and other properties of data collection fields in the study.
Data Dictionaries: Identifier and the controlled terms collected at the source.
Visits: Name of the visits as defined in the EDC.
eDT Metadata
eDT Metadata is the blueprint metadata that describes the data collected as part of that external data transfer (from clinical sites to the sponsor). This includes
Dataset name, label, repeating properties, etc.
Variable name, datatype, label and associated codelist, etc.
Study SDTM Mappings Metadata (specifications)
Study SDTM mappings metadata is the study SDTM specification. To develop the SDTM domains, {sdtm.oak} requires the user to prepare the Study SDTM mappings metadata. Unlike the conventional SDTM specification, which includes one tab per domain defining the target (SDTM domain, Variables) to source (raw dataset, raw variables) and SDTM mappings, the SDTM spec for {sdtm.oak} defines the source-to-target relationship. For each source, the SDTM mapping, algorithms, and associated metadata are defined. The table below presents the columns in the SDTM mapping specification and its explanation.
Variable_Name | Description_of_the_variable | Example_Values | Association_with_mapping_Algorithms |
---|---|---|---|
study_number | Study Number | test_study | Generic Use |
raw_source_model | Data Collection model | e-CRF or eDT | Generic Use |
raw_dataset | Name of the raw or source dataset | VTLS1, DEM | Required for all mapping algorithms |
raw_dataset_ordinal | Ordinal of the raw dataset as defined in EDC or eDT specification | 1, 2, 3, etc | Generic Use |
raw_dataset_label | Label of the raw or source dataset | Vital Signs, Demographics |
Generic Use |
raw_variable | Name of the raw variable | SEX_001, BRTHDD |
Generic Use |
raw_variable_label | Label of the raw variable | Systolic Blood Pressure, Birth Day |
Generic Use |
raw_variable_ordinal | Ordinal of the variable as defined in the eCRF or eDT specification | 1, 2, 3, etc | Generic Use |
raw_variable_type | Type of the Raw Variable | Text Box, Date control |
Required for all mapping algorithms |
raw_data_format | Data format of the raw variable | $200, dd MON YYYY |
Required for all mapping algorithms |
study_specific |
TRUE indicates that the source is study
specific. FALSE indicates that the raw variable is part of
data standards |
TRUE, FALSE | Generic Use |
annotation_ordinal | Ordinal of the SDTM mappings for the particular raw source | 1, 2, 3, etc | Required for all mapping algorithms |
mapping_is_dataset | Indicates if the SDTM mapping is at the dataset level.
TRUE indicates that it is dataset level mapping. |
TRUE, FALSE | Required for all mapping algorithms |
annotation_text | SDTM mapping text or annotation text | VS.VSORRES when VSTESTCD = ‘SYSBP’ | Generic Use |
target_domain | Name of the target domain. | VS, MH | Required for all mapping algorithms |
target_sdtm_variable | Name of the target SDTM variable | VSORRES, MHSTDTC | Required for all mapping algorithms |
target_sdtm_variable_role | CDISC Role for the SDTM target variable defined in the annotation. | Topic Variable, Grouping Qualifier, Identifier Variable |
Required for all mapping algorithms |
target_sdtm_variable_codelist_code | NCI or sponsor code of the codelist assigned to the SDTM target variable defined in the annotation. | C66742 C66790 |
Required for all mapping algorithms |
target_sdtm_variable_ controlled_terms_or_format | Controlled terms or format for the target variable
defined in the annotation (as defined per CDISC).
target_sdtm_variable_controlled_terms_or_format is required
for SDTM Define.xml |
(AGEU) ISO 8601 (SEX) |
Generic Use |
target_sdtm_variable_ordinal | Ordinal of the target SDTM variable | 1, 2, 3 | Required for all mapping algorithms |
origin | Origin of metadata source, values are subject to controlled terminology | Derived, Assigned, Collected, Predecessor |
Used for define.xml |
mapping_algorithm | Mapping Algorithm | condition_add assign_ct ae_aerel hardcode_ct |
Required for all mapping algorithms |
sub_algorithm | The sub-algorithm (scenario) of the source-to-target mapping | assign_no_ct hardcode_ct |
Only when Mapping Algorithm is
condition_add dataset_level |
target_hardcoded_value | Text (Hardcoded value) that applies to the target. | ALZHEIMER’S DISEASE HISTORY | assign_no_ct hardcode_no_ct |
target_term_value | CDISC Submission value or sponsor value which represents a hardcoded text | Y, beats/min, INFORMED CONSENT OBTAINED |
harcode_ct |
condition_add_raw_dat | Condition that has to be applied at a raw dataset before applying a mapping. Can be a valid R filter statement. | Map qualifier CMSTRTPT Annotation text is If MDPRIOR == 1 then CM.CMSTRTPT = ‘BEFORE’ raw_dat parameter as condition_add(cm_raw, MDPRIOR == 1) | condition_add |
condition_add_tgt_dat | Condition that has to be applied at a target dataset before applying a mapping. Can be a valid R filter statement. | Map qualifier CMDOSFRQ Annotation text is If CMTRT is not null then map the collected value in raw dataset cm_raw and raw variable MDFRQ to CMDOSFRQ tgt_dat parameter as condition_add(., !is.na(CMTRT)) | condition_add |
merge_type | Specifies the type of join | left_join right_join full_join visit_join subject_join |
MERGE |
merge_left | Specifies the left component of the merge | VTLS1 | MERGE |
merge_right | Specifies the right component of the merge | VACREC | MERGE |
merge_condition | Specify the condition of the join (e.g. a specific variable that should match in the components of the merge) | VTLS1.SUBJECT = VACREC.SUBJECT, MD1.MDNUM = VACREC.MDNUM |
MERGE |
unduplicate_keys | Raw variables that should be used to determine whether an observation in the source data is a duplicate record and subject to being removed | VTLS1.SUBJECT, VTLS1.DATAPAGEID |
REMOVE_DUP |
groupby_keys | Raw Variables or aggregation functions (i.e. earliest, latest) to group source data records before mapping to SDTM | TXINF1.DATAPGID, Earliest |
GROUP_BY |