DM
Introduction
This article describes how to create a demographics (DM
) domain using the {sdtm.oak}
package.
Before reading this article, it is recommended that users review some of the articles in the package documentation of {sdtm.oak}
to understand some of the key concepts: Algorithms & Sub-Algorithms, Creating an Interventions Domain, which provides a detailed explanation of various concepts in {sdtm.oak}, such as oak_id_vars
, condition_add
, etc. It also offers guidance on which mapping algorithms or functions to use for different mappings and provides a more detailed explanation of how these mapping algorithms or functions work.
In this article, we will dive directly into programming and provide further explanation only where it is required.
Programming workflow
- Read in data
- Create oak_id_vars
- Read in CT
- Create reference dates configuration file
- Map Topic Variable
- Map Rest of the Variables
- Map Reference Date Variables
- Create SDTM derived variables
- Add Labels and Attributes
Read in data
Read all the raw datasets into the environment. In this example, the raw datasets needed are ec_raw
, ds_raw
and dm_raw
. Users can read them from the {pharmaverseraw}
package using the below code:
Demographics Raw dataset.
Sample of Data
Disposition Raw dataset.
Sample of Data
Study Drug Administration Raw dataset.
Sample of Data
SDTM aCRF
SDTM annotated aCRF for the raw datasets are below:
Create oak_id_vars
For example, Demographics Raw dataset with oak_id_vars
Sample of Data
Read in CT
Controlled Terminology is part of the SDTM specification and it is prepared by the user. In this example, the study controlled terminology name is sdtm_ct.csv
. Users can read it from the package using the below code:
Sample of Data
Create reference dates configuration file
Create reference date configuration file, a data frame which has the details of the variables to be used for the calculation of reference dates. The data frame should have columns listed below:
raw_dataset_name
: Name of the raw dataset.
date_var
: Date variable name from the raw dataset.
time_var
: Time variable name from the raw dataset.
dformat
: Format of the date collected in raw data.
tformat
: Format of the time collected in raw data.
sdtm_var_name
: Reference date variable name inDM
domain where the raw variable is used.
ref_date_conf_df <- tibble::tribble(
~raw_dataset_name, ~date_var, ~time_var, ~dformat, ~tformat, ~sdtm_var_name,
"ec_raw", "IT.ECSTDAT", NA_character_, "dd-mmm-yyyy", NA_character_, "RFXSTDTC",
"ec_raw", "IT.ECENDAT", NA_character_, "dd-mmm-yyyy", NA_character_, "RFXENDTC",
"ec_raw", "IT.ECSTDAT", NA_character_, "dd-mmm-yyyy", NA_character_, "RFSTDTC",
"ec_raw", "IT.ECENDAT", NA_character_, "dd-mmm-yyyy", NA_character_, "RFENDTC",
"dm_raw", "IC_DT", NA_character_, "mm/dd/yyyy", NA_character_, "RFICDTC",
"ds_raw", "DSDTCOL", "DSTMCOL", "mm-dd-yyyy", "H:M", "RFPENDTC",
"ds_raw", "DEATHDT", NA_character_, "mm/dd/yyyy", NA_character_, "DTHDTC"
)
Sample of Data
Map Topic Variable
In DM
domain, SUBJID
is the topic variable and it can be mapped from PATNUM
using a simple dplyr::mutate()
statement.
Sample of Data
Map Rest of the Variables
Map rest of the variables in DM
domain using either sdtm.oak::assign_no_ct()
or sdtm.oak::assign_ct()
depending on if the variable has controlled terminologies associated.
dm <- dm %>%
# Map AGE using assign_no_ct
assign_no_ct(
raw_dat = dm_raw,
raw_var = "IT.AGE",
tgt_var = "AGE",
id_vars = oak_id_vars()
) %>%
# Map AGEU using hardcode_ct
hardcode_ct(
raw_dat = dm_raw,
raw_var = "IT.AGE",
tgt_var = "AGEU",
tgt_val = "Year",
ct_spec = study_ct,
ct_clst = "C66781",
id_vars = oak_id_vars()
) %>%
# Map SEX using assign_ct
assign_ct(
raw_dat = dm_raw,
raw_var = "IT.SEX",
tgt_var = "SEX",
ct_spec = study_ct,
ct_clst = "C66731",
id_vars = oak_id_vars()
) %>%
# Map ETHNIC using assign_ct
assign_ct(
raw_dat = dm_raw,
raw_var = "IT.ETHNIC",
tgt_var = "ETHNIC",
ct_spec = study_ct,
ct_clst = "C66790",
id_vars = oak_id_vars()
) %>%
# Map RACE using assign_ct
assign_ct(
raw_dat = dm_raw,
raw_var = "IT.RACE",
tgt_var = "RACE",
ct_spec = study_ct,
ct_clst = "C74457",
id_vars = oak_id_vars()
) %>%
# Map ARM using assign_ct
assign_ct(
raw_dat = dm_raw,
raw_var = "PLANNED_ARM",
tgt_var = "ARM",
ct_spec = study_ct,
ct_clst = "ARM",
id_vars = oak_id_vars()
) %>%
# Map ARMCD using assign_no_ct
assign_no_ct(
raw_dat = dm_raw,
raw_var = "PLANNED_ARMCD",
tgt_var = "ARMCD",
id_vars = oak_id_vars()
) %>%
# Map ACTARM using assign_ct
assign_ct(
raw_dat = dm_raw,
raw_var = "ACTUAL_ARM",
tgt_var = "ACTARM",
ct_spec = study_ct,
ct_clst = "ARM",
id_vars = oak_id_vars()
) %>%
# Map ACTARMCD using assign_no_ct
assign_no_ct(
raw_dat = dm_raw,
raw_var = "ACTUAL_ARMCD",
tgt_var = "ACTARMCD",
id_vars = oak_id_vars()
)
ℹ These terms could not be mapped per the controlled terminology: "Placebo" and "Screen Failure".
ℹ These terms could not be mapped per the controlled terminology: "Placebo" and "Screen Failure".
Sample of Data
Map Reference Date Variables
Use sdtm.oak::oak_cal_ref_dates()
to calculate reference dates variables in ISO 8601 format. The function takes the raw variable names from reference date configuration file, and calculated the minimum or maximum dates based upon the min_max
parameter.
Variable RFSTDTC
is the reference Start Date/time for the subject in ISO 8601 character format. Usually equivalent to date/time when subject was first exposed to study treatment. So as specified in the reference date configuration file, we need to calculate the minimum date of the IT.ECSTDAT
for each subject from the ec_raw
dataset. Therefore, in min_max
parameter, “min” is selected for the calculation.
Variable RFENDTC
is the Reference end date/time for the subject in ISO 8601 character format. Usually equivalent to the date/time when subject was determined to have ended the trial, and often equivalent to date/time of last exposure to study treatment. As specified in the reference date configuration file, we need to calculate the maximum date of the IT.ECENDAT
for each subject from the ec_raw
dataset. Therefore, in min_max
parameter, “max” is selected for the calculation.
Sample of Data
The same derivation logic is applicable to other reference date/time variables.
dm <- dm %>%
# Derive RFXSTDTC using oak_cal_ref_dates
oak_cal_ref_dates(
ds_in = .,
der_var = "RFXSTDTC",
min_max = "min",
ref_date_config_df = ref_date_conf_df,
raw_source = list(
ec_raw = ec_raw,
ds_raw = ds_raw,
dm_raw = dm_raw
)
) %>%
# Derive RFXENDTC using oak_cal_ref_dates
oak_cal_ref_dates(
ds_in = .,
der_var = "RFXENDTC",
min_max = "max",
ref_date_config_df = ref_date_conf_df,
raw_source = list(
ec_raw = ec_raw,
ds_raw = ds_raw,
dm_raw = dm_raw
)
) %>%
# Derive RFICDTC using oak_cal_ref_dates
oak_cal_ref_dates(
ds_in = .,
der_var = "RFICDTC",
min_max = "min",
ref_date_config_df = ref_date_conf_df,
raw_source = list(
ec_raw = ec_raw,
ds_raw = ds_raw,
dm_raw = dm_raw
)
) %>%
# Derive RFPENDTC using oak_cal_ref_dates
oak_cal_ref_dates(
ds_in = .,
der_var = "RFPENDTC",
min_max = "max",
ref_date_config_df = ref_date_conf_df,
raw_source = list(
ec_raw = ec_raw,
ds_raw = ds_raw,
dm_raw = dm_raw
)
) %>%
# Map DTHDTC using oak_cal_ref_dates
oak_cal_ref_dates(
ds_in = .,
der_var = "DTHDTC",
min_max = "min",
ref_date_config_df = ref_date_conf_df,
raw_source = list(
ec_raw = ec_raw,
ds_raw = ds_raw,
dm_raw = dm_raw
)
)
Create SDTM derived variables
dm <- dm %>%
mutate(
STUDYID = dm_raw$STUDY,
DOMAIN = "DM",
USUBJID = paste0("01-", dm_raw$PATNUM),
COUNTRY = dm_raw$COUNTRY,
DTHFL = dplyr::if_else(is.na(DTHDTC), NA_character_, "Y")
) %>%
# Map DMDTC using assign_datetime
assign_datetime(
raw_dat = dm_raw,
raw_var = "COL_DT",
tgt_var = "DMDTC",
raw_fmt = c("m/d/y"),
id_vars = oak_id_vars()
) %>%
# Derive study day
derive_study_day(
sdtm_in = .,
dm_domain = .,
tgdt = "DMDTC",
refdt = "RFXSTDTC",
study_day_var = "DMDY"
)
Sample of Data
Add Labels and Attributes
Yet to be developed. Please refer to {metatools}
package to investigate options.