Skip to contents

Core Concept

SDTM mappings are defined as algorithms that transform the collected (eCRF, eDT) source data into the target SDTM data model. Mapping algorithms are the backbone of the {sdtm.oak} - SDTM data transformation engine.

Key Points:

  • Algorithms can be re-used across multiple SDTM domains.

  • Algorithms are pre-specified for data collection standards in MDR (if applicable) to facilitate automation.

  • Programming language agnostic - this concept does not rely on a specific programming language for implementation. The {sdtm.oak} team implemented them as R functions.

Here is an example of reusing an algorithm across multiple domains, variables, and also to a non-standard

List of Algorithms

This release of {sdtm.oak} supports the following algorithms: assign_no_ct, assign_ct, hardcode_no_ct, hardcode_ct, assign_datetime, condition_add. Rest of the algorithms will be developed in the subsequent releases.

The following table provides a brief description of each algorithm.

Algorithm.Name Description Example
assign_no_ct One-to-one mapping between the raw source and a target SDTM variable that has no controlled terminology restrictions. Just a simple assignment statement. MH.MHTERM
AE.AETERM
assign_ct One-to-one mapping between the raw source and a target SDTM variable that is subject to controlled terminology restrictions. A simple assign statement and applying controlled terminology. This will be used only if the SDTM variable has an associated controlled terminology. VS.VSPOS
VS.VSLAT
assign_datetime One-to-one mapping between the raw source and a target that involves mapping a Date or time or datetime component. This mapping algorithm also takes care of handling unknown dates and converting them into. ISO8601 format. MH.MHSTDTC
AE.AEENDTC
hardcode_ct Mapping a hardcoded value to a target SDTM variable that is subject to terminology restrictions. This will be used only if the SDTM variable has an associated controlled terminology. MH.MHPRESP = ‘Y’

VS.VSTEST = ‘Systolic Blood Pressure’

VS.VSORRESU = ‘mmHg’
hardcode_no_ct Mapping a hardcoded value to a target SDTM variable that has no terminology restrictions. FA.FASCAT = ‘COVID-19 PROBABLE CASE’

CM.CMTRT = ‘FLUIDS’
condition_add Algorithm that is used to filter the source data and/or target domain based on a condition. The mapping will be applied only if the condition is met. The filter can be applied either at the source dataset or at target dataset or both. This algorithm has to be used in conjunction with other algorithms, that is if the condition is met perform the mapping using algorithms like assign_ct, assign_no_ct, hardcode_ct, hardcode_no_ct, assign_datetime. If If MDPRIOR == 1 then CM.CMSTRTPT = ‘BEFORE’.

VS.VSMETHOD when VSTESTCD = ‘TEMP’

If collected value in raw variable DOS is numeric then CM.CMDOSE

If collected value in raw variable MOD is different to CMTRT then map to CM.CMMODIFY
ae_aerel Algorithm that is currently unique to AE.AEREL, particularly when more than one drug is used in the study.
If any collected study drug causalities are ‘Yes’ then AE.AEREL is Y.
If all collected study drug causalities are ‘NA’ then AE.AEREL is NA.
If no study drug causalities are ‘Yes’ but there is at least one causality of ‘No’ then AE.AEREL is N.
Individual study drug causality responses are stored in AERELn in SUPPAE.
For AE.AEREL and AERELn in SUPPAE
dataset_level Indicates a dataset-level mapping. These mappings will be applied to all SDTM records created from that source. Also called an eCRF-level mappings in eCRF and dataset-level mappings in eDT VS = ‘Vital Signs’

MH.MHCAT = ‘PROSTATE CANCER HISTORY’
not_submitted Instruction that sdtm.oak should not map the collected item to SDTM at all.
relrec Associate two domains based on the variables in each domain and how those are related. Specifies the name of two domains that are related via RELREC. BE record related to BS record via RELREC
multiple_responses Consolidate the responses from more than one source variable into one target variable. Used when multiple responses may be given for a single SDTM column. sdtm.oak will populate all target variable(s) after determining the number of responses provided. AE.AERELNST/ AERELNSn IN SUPPAE


DM.RACE, if only one value is selected.
DM.RACE = MULTIPLE, if more than one value is selected.
RACEn in SUPPDM where n = 1 to N selected values
split_to_suppqual Consolidates the responses from more than one source variable into more than one target variable (always a suppqual/non-standard variable). There is no ‘parent’ target variable that is populated with ‘MULTIPLE’. If both Filipino and Samoan are checked, CRACE1 will be ‘FILIPINO’ and CRACE2 will be ‘SAMOAN’.
If only Chinese is checked, CRACE1 will be ‘CHINESE’.
remove_dup Sub-algorithm at the domain level that indicates some source records may be removed during the sdtm.oak mapping process if determined to be duplicate records. Remove duplicates on the Vital signs raw dataset based on subject number
group_by Sub-algorithm used at the domain level to group source records before mapping to SDTM. This is used in the event we need to collapse data collected across multiple rows into one row in SDTM but it is not a simple un-duplication effort. For example, the way infusion study drug administration data requires us to create 1 SDTM record in EC from 1 or more sources records. When there is more than one source record, we need to take the earliest collected infusion start date (for ECSTDTC) and the latest collected infusion end date within an eCRF instance. EC = ‘Exposure as Collected’
merge_datasets To indicate a join condition with a secondary source or multiple sources. Merges are expressed at the domain level only (not at data point or variable level). This is a sub-algorithm and can only be used with algorithm DATASET_LEVEL. Merge AE raw dataset with SAE based on Subject number.

Sub-algorithms

{sdtm.oak} supports two levels for defining algorithms. For example, there are some SDTM mappings where a certain action has to be taken only when a condition is met. In such cases, the primary algorithm checks for the condition, and the sub-algorithm executes the mappings when the condition is met.

Currently, sub-algorithms must be provided for this main algorithms.

  • condition_add
  • dataset_level

Some algorithms can be interchangeably used as algorithms and as sub-algorithms as seen below (not an exhaustive list)

The permutation & combination of algorithms & sub-algorithms creates endless possibilities to accommodate different types of mappings.