Main Idea
The main idea of admiral is that an ADaM dataset is built by a sequence of derivations. Each derivation adds one or more variables or parameters to the processed dataset. This modular approach makes it easy to adjust code by adding, removing, or modifying derivations. Each derivation is a function call. Consider for example the following script which creates a (very simple) ADSL dataset.
Load Packages and Example Datasets
First, we will load our packages and example datasets to help with our ADSL
creation. The dplyr and lubridate packages are tidyverse packages and used heavily throughout this script. The admiral package also leverages the {admiral.test}
package for example SDTM datasets which are from the CDISC Pilot Study.
Derive Treatment Variables (TRT0xP
, TRT0xA
)
The mapping of the treatment variables is left to the ADaM programmer. An example mapping may be:
Derive/Impute Numeric Treatment Date/Time and Flag Variables (TRTSDTM
, TRTEDTM
, TRTSTMF
, TRTETMF
)
The function derive_vars_dtm()
can be used to convert the DTC variables from EX
to numeric datetime variable and impute missing components. The function call returns the original data frame with the column EXSTDTM
, EXENDTM
and corresponding time imputation flag variables EXSTTMF
and EXENTMF
added to the end of the dataframe. Exposure observations with an incomplete date are ignored. We impute missing time to be 23:59:59 using time_imputation = "last"
. The required imputation flags are determined automatically by the function. Here only time imputation flags are derived because time is imputed but date is not imputed.
Don’t be intimidated by the number of arguments! We try to make our arguments self-explanatory, e.g. the new_vars_prefix
places EXST
at the start of the --DTM
variable and time_imputation = "last"
appends 23:59:59. However, this is not always possible to make every argument self-explanatory. If you click on the function, derive_vars_dtm()
, you can bring up the reference documentation and learn more about each argument.
# Derive treatment variables
## Impute time of exposure dates (creates numeric datetime and time imputation flag variables)
ex_ext <- ex %>%
derive_vars_dtm(
dtc = EXSTDTC,
new_vars_prefix = "EXST"
) %>%
derive_vars_dtm(
dtc = EXENDTC,
new_vars_prefix = "EXEN",
time_imputation = "last"
)
## Derive variables for first/last treatment date and time imputation flags
adsl <- adsl %>%
derive_vars_merged(
dataset_add = ex_ext,
filter_add = !is.na(EXSTDTM),
new_vars = vars(TRTSDTM = EXSTDTM, TRTSTMF = EXSTTMF),
order = vars(EXSTDTM, EXSEQ),
mode = "first",
by_vars = vars(STUDYID, USUBJID)
) %>%
derive_vars_merged(
dataset_add = ex_ext,
filter_add = !is.na(EXENDTM),
new_vars = vars(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
order = vars(EXENDTM, EXSEQ),
mode = "last",
by_vars = vars(STUDYID, USUBJID)
)
Derive Date Variables from Date/Time Variables (TRTSDT
, TRTEDT
)
The datetime variables returned can be converted to dates using the derive_vars_dtm_to_dt()
function.
adsl <- adsl %>%
derive_vars_dtm_to_dt(source_vars = vars(TRTSDTM, TRTEDTM))
Derive Treatment Duration (TRTDURD
)
Now, that TRTSDT
and TRTEDT
are derived, the function derive_var_trtdurd()
can be used to calculate the Treatment duration (TRTDURD
). Notice the lack of inputs. The function defaults are set to TRTSDT
and TRTEDT
. Clicking on derive_var_trtdurd()
will bring up the reference documentation where you can see the default arguments.
adsl <- adsl %>%
derive_var_trtdurd()
Amazing! With one dplyr function and four admiral functions we successfully created nine new variables for our ADSL
dataset. Let’s take a look at all our newly derived variables.
Note: We only display variables that were derived. A user running this code will have additional adsl variables displayed. You can use the Choose Columns to Display button to add more variables into the table.
Derivations
The most important functions in admiral are the derivations. These functions start with derive_
. The first parameter of these functions expects the input dataset. This allows us to string together derivations using the %>%
operator.
Functions which derive a dedicated variable start with derive_var_
followed by the variable name, e.g., derive_var_trtdurd()
derives the TRTDURD
variable.
Functions which can derive multiple variables start with derive_vars_
followed by the variable name, e.g., derive_vars_dtm()
can derive both the TRTSDTM
and TRTSTMF
variables.
Functions which derive a dedicated parameter start with derive_param_
followed by the parameter name, e.g., derive_param_bmi()
derives the BMI
parameter.
Input and Output
It is expected that the input dataset is not grouped. Otherwise an error is issued.
The input dataset should not include variables starting with temp_
. These variable names are reserved for temporary variables used within the derivation and are removed from the output dataset. If the input dataset contains such variables, an error is issued.
It is expected all variable names are uppercase in the input dataset and new variables will be returned in uppercase.
The output dataset is ungrouped. The observations are not ordered in a dedicated way. In particular, the order of the observations of the input dataset may not be preserved.
Computations
Computations expect vectors as input and return a vector. Usually these computation functions can not be used with %>%
. These functions can be used in expressions like convert_dtc_to_dt()
in the derivation of FINLABDT
in the example below:
# Derive final lab visit date
ds_final_lab_visit <- ds %>%
filter(DSDECOD == "FINAL LAB VISIT") %>%
transmute(USUBJID, FINLABDT = convert_dtc_to_dt(DSSTDTC))
# Derive treatment variables
adsl <- dm %>%
# Merge on final lab visit date
derive_vars_merged(
dataset_add = ds_final_lab_visit,
by_vars = vars(USUBJID)
)
Parameters
For parameters which expect variable names or expressions of variable names, symbols or expressions must be specified rather than strings.
For parameters which expect a single variable name, the name can be specified without quotes and quotation, e.g.
new_var = TEMPBL
For parameters which expect one or more variable names, a list of symbols is expected, e.g.
by_vars = vars(PARAMCD, AVISIT)
For parameters which expect a single expression, the expression needs to be passed “as is”, e.g.
filter = PARAMCD == "TEMP"
For parameters which expect one or more expressions, a list of expressions is expected, e.g.
order = vars(AVISIT, desc(AESEV))
Handling of Missing Values
When using the haven package to read SAS datasets into R, SAS-style character missing values, i.e. ""
, are not converted into proper R NA
values. Rather they are kept as is. This is problematic for any downstream data processing as R handles ""
just as any other string. Thus, before any data manipulation is being performed SAS blanks should be converted to R NA
s using admiral’s convert_blanks_to_na()
function, e.g.
dm <- haven::read_sas("dm.sas7bdat") %>%
convert_blanks_to_na()
Note that any logical operator being applied to an NA
value always returns NA
rather than TRUE
or FALSE
.
visits <- c("Baseline", NA, "Screening", "Week 1 Day 7")
visits != "Baseline"
#> [1] FALSE NA TRUE TRUE
The only exception is is.na()
which returns TRUE
if the input is NA
.
is.na(visits)
#> [1] FALSE TRUE FALSE FALSE
Thus, to filter all visits which are not "Baseline"
the following condition would need to be used.
visits != "Baseline" | is.na(visits)
#> [1] FALSE TRUE TRUE TRUE
Also note that most aggregation functions, like mean()
or max()
, also return NA
if any element of the input vector is missing.
To avoid this behavior one has to explicitly set na.rm = TRUE
.
This is very important to keep in mind when using admiral’s aggregation functions such as derive_summary_records()
.
Validation
All functions are reviewed and tested to ensure that they work as described in the documentation. They are not validated yet.
Although admiral follows CDISC standards, it does not claim that the dataset resulting from calling admiral functions is ADaM compliant. This has to be ensured by the user.
Starting a Script
For the ADaM data structures, an overview of the flow and example function calls for the most common steps are provided by the following vignettes:
admiral also provides template R scripts as a starting point. They can be created by calling use_ad_template()
, e.g.,
use_ad_template(
adam_name = "adsl",
save_path = "./ad_adsl.R"
)
A list of all available templates can be obtained by list_all_templates()
:
list_all_templates()
#> Existing ADaM templates in package 'admiral':
#> • ADAE
#> • ADCM
#> • ADEG
#> • ADEX
#> • ADLB
#> • ADMH
#> • ADPP
#> • ADSL
#> • ADVS
Support
Support is provided via the admiral Slack channel.