Skip to contents

Main Idea

The main idea of admiral is that an ADaM dataset is built by a sequence of derivations. Each derivation adds one or more variables or parameters to the processed dataset. This modular approach makes it easy to adjust code by adding, removing, or modifying derivations. Each derivation is a function call. Consider for example the following script which creates a (very simple) ADSL dataset.

Load Packages and Example Datasets

First, we will load our packages and example datasets to help with our ADSL creation. The dplyr and lubridate packages are tidyverse packages and used heavily throughout this script. The admiral package also leverages the pharmaversesdtm package for example SDTM datasets which are from the CDISC Pilot Study.

library(dplyr, warn.conflicts = FALSE)
library(lubridate)
library(admiral)
library(pharmaversesdtm)

# Read in SDTM datasets
data("dm")
data("ds")
data("ex")

dm <- convert_blanks_to_na(dm)
ds <- convert_blanks_to_na(ds)
ex <- convert_blanks_to_na(ex)

Derive Treatment Variables (TRT0xP, TRT0xA)

The mapping of the treatment variables is left to the ADaM programmer. An example mapping may be:

adsl <- dm %>%
  mutate(TRT01P = ARM, TRT01A = ACTARM)

Derive/Impute Numeric Treatment Date/Time and Flag Variables (TRTSDTM, TRTEDTM, TRTSTMF, TRTETMF)

The function derive_vars_dtm() can be used to convert the DTC variables from EX to numeric datetime variable and impute missing components. The function call returns the original data frame with the column EXSTDTM, EXENDTM and corresponding time imputation flag variables EXSTTMF and EXENTMF added to the end of the dataframe. Exposure observations with an incomplete date are ignored. We impute missing time to be 23:59:59 using time_imputation = "last". The required imputation flags are determined automatically by the function. Here only time imputation flags are derived because time is imputed but date is not imputed.

Don’t be intimidated by the number of arguments! We try to make our arguments self-explanatory, e.g. the new_vars_prefix places EXST at the start of the --DTM variable and time_imputation = "last" appends 23:59:59. However, this is not always possible to make every argument self-explanatory. If you click on the function, derive_vars_dtm(), you can bring up the reference documentation and learn more about each argument.

# Derive treatment variables
## Impute time of exposure dates (creates numeric datetime and time imputation flag variables)
ex_ext <- ex %>%
  derive_vars_dtm(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dtm(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN",
    time_imputation = "last"
  )

## Derive variables for first/last treatment date and time imputation flags
adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = !is.na(EXSTDTM),
    new_vars = exprs(TRTSDTM = EXSTDTM, TRTSTMF = EXSTTMF),
    order = exprs(EXSTDTM, EXSEQ),
    mode = "first",
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = !is.na(EXENDTM),
    new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
    order = exprs(EXENDTM, EXSEQ),
    mode = "last",
    by_vars = exprs(STUDYID, USUBJID)
  )

Derive Date Variables from Date/Time Variables (TRTSDT, TRTEDT)

The datetime variables returned can be converted to dates using the derive_vars_dtm_to_dt() function.

adsl <- adsl %>%
  derive_vars_dtm_to_dt(source_vars = exprs(TRTSDTM, TRTEDTM))

Derive Treatment Duration (TRTDURD)

Now, that TRTSDT and TRTEDT are derived, the function derive_var_trtdurd() can be used to calculate the Treatment duration (TRTDURD). Notice the lack of inputs. The function defaults are set to TRTSDT and TRTEDT. Clicking on derive_var_trtdurd() will bring up the reference documentation where you can see the default arguments.

adsl <- adsl %>%
  derive_var_trtdurd()

Amazing! With one dplyr function and four admiral functions we successfully created nine new variables for our ADSL dataset. Let’s take a look at all our newly derived variables.

Note: We only display variables that were derived. A user running this code will have additional adsl variables displayed. You can use the Choose Columns to Display button to add more variables into the table.

Derivations

The most important functions in admiral are the derivations. Derivations add variables or observations to the input dataset. Existing variables and observations of the input dataset are not changed. Derivation functions start with derive_. The first argument of these functions expects the input dataset. This allows us to string together derivations using the %>% operator.

Functions which derive a dedicated variable start with derive_var_ followed by the variable name, e.g., derive_var_trtdurd() derives the TRTDURD variable.

Functions which can derive multiple variables start with derive_vars_ followed by the variable name, e.g., derive_vars_dtm() can derive both the TRTSDTM and TRTSTMF variables.

Functions which derive a dedicated parameter start with derive_param_ followed by the parameter name, e.g., derive_param_bmi() derives the BMI parameter.

Input and Output

It is expected that the input dataset is not grouped. Otherwise an error is issued.

The output dataset is ungrouped. The observations are not ordered in a dedicated way. In particular, the order of the observations of the input dataset may not be preserved.

Computations

Computations expect vectors as input and return a vector. Usually these computation functions can not be used with %>%. These functions can be used in expressions like convert_dtc_to_dt() in the derivation of FINLABDT in the example below:

# Add the date of the final lab visit to ADSL
adsl <- dm %>%
  derive_vars_merged(
    dataset_add = ds,
    by_vars = exprs(USUBJID),
    new_vars = exprs(FINLABDT = convert_dtc_to_dt(DSSTDTC)),
    filter_add = DSDECOD == "FINAL LAB VISIT"
  )

Arguments

For arguments which expect variable names or expressions of variable names, symbols or expressions must be specified rather than strings.

  • For arguments which expect a single variable name, the name can be specified without quotes and quotation, e.g. new_var = TEMPBL

  • For arguments which expect one or more variable names, a list of symbols is expected, e.g. by_vars = exprs(PARAMCD, AVISIT)

  • For arguments which expect a single expression, the expression needs to be passed “as is”, e.g. filter = PARAMCD == "TEMP"

  • For arguments which expect one or more expressions, a list of expressions is expected, e.g. order = exprs(AVISIT, desc(AESEV))

What is exprs()?

exprs() is a function from the rlang package. It is used to create a list of expressions. The expressions are not evaluated. Rather they are passed on to the derivation function which evaluates them in its own environment. This allows the derivation function to evaluate the expressions in the context of the input dataset. Specifically, the exprs() function allows users to pass variable names of datasets to the function without wrapping them in quotation marks.

Handling of Missing Values

When using the haven package to read SAS datasets into R, SAS-style character missing values, i.e. "", are not converted into proper R NA values. Rather they are kept as is. This is problematic for any downstream data processing as R handles "" just as any other string. Thus, before any data manipulation is being performed SAS blanks should be converted to R NAs using admiral’s convert_blanks_to_na() function, e.g.

dm <- haven::read_sas("dm.sas7bdat") %>% 
  convert_blanks_to_na()

Note that any logical operator being applied to an NA value always returns NA rather than TRUE or FALSE.

visits <- c("Baseline", NA, "Screening", "Week 1 Day 7")
visits != "Baseline"
#> [1] FALSE    NA  TRUE  TRUE

The only exception is is.na() which returns TRUE if the input is NA.

is.na(visits)
#> [1] FALSE  TRUE FALSE FALSE

Thus, to filter all visits which are not "Baseline" the following condition would need to be used.

visits != "Baseline" | is.na(visits)
#> [1] FALSE  TRUE  TRUE  TRUE

Also note that most aggregation functions, like mean() or max(), also return NA if any element of the input vector is missing.

mean(c(1, NA, 2))
#> [1] NA

To avoid this behavior one has to explicitly set na.rm = TRUE.

mean(c(1, NA, 2), na.rm = TRUE)
#> [1] 1.5

This is very important to keep in mind when using admiral’s aggregation functions such as derive_summary_records().

For handling of NAs in sorting variables see Sort Order.

Validation

All functions are reviewed and tested to ensure that they work as described in the documentation. They are not validated yet.

Although admiral follows CDISC standards, it does not claim that the dataset resulting from calling admiral functions is ADaM compliant. This has to be ensured by the user.

Starting a Script

For the ADaM data structures, an overview of the flow and example function calls for the most common steps are provided by the following vignettes:

admiral also provides template R scripts as a starting point. They can be created by calling use_ad_template(), e.g.,

use_ad_template(
  adam_name = "adsl",
  save_path = "./ad_adsl.R"
)

A list of all available templates can be obtained by list_all_templates():

list_all_templates()
#> Existing ADaM templates in package 'admiral':
#> • ADAE
#> • ADCM
#> • ADEG
#> • ADEX
#> • ADLB
#> • ADLBHY
#> • ADMH
#> • ADPC
#> • ADPP
#> • ADPPK
#> • ADSL
#> • ADVS

Support

Support is provided via the admiral Slack channel.