Introduction
This article describes creating a BDS finding ADaM. Examples are currently presented and tested in the context of ADVS. However, the examples could be applied to other BDS Finding ADaMs such as ADEG, ADLB, etc. where a single result is captured in an SDTM Finding domain on a single date and/or time.
Note: All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified.
Programming Workflow
- Read in Data
- Derive/Impute Numeric Date/Time and Analysis Day
(
ADT
,ADTM
,ADY
,ADTF
,ATMF
) - Assign
PARAMCD
,PARAM
,PARAMN
,PARCAT1
- Derive Results (
AVAL
,AVALC
) - Derive Additional Parameters
(e.g.
BSA
,BMI
, orMAP
forADVS
) - Derive Timing Variables (e.g.
APHASE
,AVISIT
,APERIOD
) - Timing Flag Variables
(e.g.
ONTRTFL
) - Assign Reference Range Indicator
(
ANRIND
) - Derive Baseline (
BASETYPE
,ABLFL
,BASE
,BASEC
,BNRIND
) - Derive Change from Baseline (
CHG
,PCHG
) - Derive Shift (e.g.
SHIFT1
) - Derive Analysis Ratio
(e.g.
R2BASE
) - Derive Analysis Flags
(e.g.
ANL01FL
) - Assign Treatment (
TRTA
,TRTP
) - Assign
ASEQ
- Derive Categorization Variables
(
AVALCATx
) - Add ADSL variables
- Derive New Rows
- Add Labels and Attributes
Read in Data
To start, all data frames needed for the creation of
ADVS
should be read into the environment. This will be a
company specific process. Some of the data frames needed may be
VS
and ADSL
.
For example purpose, the CDISC Pilot SDTM and ADaM datasets—which are included in pharmaversesdtm—are used.
library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(lubridate)
library(stringr)
library(tibble)
data("admiral_adsl")
data("vs")
adsl <- admiral_adsl
vs <- convert_blanks_to_na(vs)
At this step, it may be useful to join ADSL
to your
VS
domain. Only the ADSL
variables used for
derivations are selected at this step. The rest of the relevant
ADSL
variables would be added later.
adsl_vars <- exprs(TRTSDT, TRTEDT, TRT01A, TRT01P)
advs <- derive_vars_merged(
vs,
dataset_add = adsl,
new_vars = adsl_vars,
by_vars = exprs(STUDYID, USUBJID)
)
Derive/Impute Numeric Date/Time and Analysis Day (ADT
,
ADTM
, ADY
, ADTF
,
ATMF
)
The function derive_vars_dt()
can be used to derive
ADT
. This function allows the user to impute the date as
well.
Example calls:
advs <- derive_vars_dt(advs, new_vars_prefix = "A", dtc = VSDTC)
If imputation is needed and the date is to be imputed to the first of the month, the call would be:
advs <- derive_vars_dt(
advs,
new_vars_prefix = "A",
dtc = VSDTC,
highest_imputation = "M"
)
Similarly, ADTM
may be created using the function
derive_vars_dtm()
. Imputation may be done on both the date
and time components of ADTM
.
# CDISC Pilot data does not contain times and the output of the derivation
# ADTM is not presented.
advs <- derive_vars_dtm(
advs,
new_vars_prefix = "A",
dtc = VSDTC,
highest_imputation = "M"
)
By default, the variable ADTF
for
derive_vars_dt()
or ADTF
and ATMF
for derive_vars_dtm()
will be created and populated with
the controlled terminology outlined in the ADaM IG for date
imputations.
See also Date and Time Imputation.
Once ADT
is derived, the function
derive_vars_dy()
can be used to derive ADY
.
This example assumes both ADT
and TRTSDT
exist
on the data frame.
advs <-
derive_vars_dy(advs, reference_date = TRTSDT, source_vars = exprs(ADT))
Assign PARAMCD
, PARAM
,
PARAMN
, PARCAT1
To assign parameter level values such as PARAMCD
,
PARAM
, PARAMN
, PARCAT1
, etc., a
lookup can be created to join to the source data.
For example, when creating ADVS
, a lookup based on the
SDTM --TESTCD
value may be created:
VSTESTCD |
PARAMCD |
PARAM |
PARAMN |
PARCAT1 |
PARCAT1N |
---|---|---|---|---|---|
HEIGHT | HEIGHT | Height (cm) | 1 | Subject Characteristic | 1 |
WEIGHT | WEIGHT | Weight (kg) | 2 | Subject Characteristic | 1 |
DIABP | DIABP | Diastolic Blood Pressure (mmHg) | 3 | Vital Sign | 2 |
MAP | MAP | Mean Arterial Pressure | 4 | Vital Sign | 2 |
PULSE | PULSE | Pulse Rate (beats/min) | 5 | Vital Sign | 2 |
SYSBP | SYSBP | Systolic Blood Pressure (mmHg) | 6 | Vital Sign | 2 |
TEMP | TEMP | Temperature (C) | 7 | Vital Sign | 2 |
This lookup may now be joined to the source data:
At this stage, only PARAMCD
is required to perform the
derivations. Additional derived parameters may be added, so only
PARAMCD
is joined to the datasets at this point. All other
variables related to PARAMCD
(e.g. PARAM
,
PARAMCAT1
, …) will be added when all PARAMCD
are derived.
advs <- derive_vars_merged_lookup(
advs,
dataset_add = param_lookup,
new_vars = exprs(PARAMCD),
by_vars = exprs(VSTESTCD)
)
#> All `VSTESTCD` are mapped.
Please note, it may be necessary to include other variables in the
join. For example, perhaps the PARAMCD
is based on
VSTESTCD
and VSPOS
, it may be necessary to
expand this lookup or create a separate look up for
PARAMCD
.
If more than one lookup table, e.g., company parameter mappings and
project parameter mappings, are available,
consolidate_metadata()
can be used to consolidate these
into a single lookup table.
Derive Results (AVAL
, AVALC
)
The mapping of AVAL
and AVALC
is left to
the ADaM programmer. An example mapping may be:
advs <- mutate(
advs,
AVAL = VSSTRESN
)
In this example, as is often the case for ADVS, all AVAL
values are numeric without any corresponding non-redundant text value
for AVALC
. Per recommendation in ADaMIG v1.3 we do not map
AVALC
.
Derive Additional Parameters (e.g. BSA
,
BMI
or MAP
for ADVS
)
Optionally derive new parameters creating PARAMCD
and
AVAL
. Note that only variables specified in the
by_vars
argument will be populated in the newly created
records. This is relevant to the functions
derive_param_map
, derive_param_bsa
,
derive_param_bmi
, and derive_param_qtc
.
Below is an example of creating Mean Arterial Pressure
for ADVS
, see also Example 3 in section below Derive New Rows for alternative way of creating
new parameters.
advs <- derive_param_map(
advs,
by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars, VISIT, VISITNUM, ADT, ADY, VSTPT, VSTPTNUM),
set_values_to = exprs(PARAMCD = "MAP"),
get_unit_expr = VSSTRESU,
filter = VSSTAT != "NOT DONE" | is.na(VSSTAT)
)
Likewise, function call below, to create parameter
Body Surface Area
(BSA) and Body Mass Index
(BMI) for ADVS
domain. Note that if height is collected
only once use constant_by_vars
to specify the subject-level
variable to merge on. Otherwise BSA and BMI are only calculated for
visits where both are collected.
advs <- derive_param_bsa(
advs,
by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars, VISIT, VISITNUM, ADT, ADY, VSTPT, VSTPTNUM),
method = "Mosteller",
set_values_to = exprs(PARAMCD = "BSA"),
get_unit_expr = VSSTRESU,
filter = VSSTAT != "NOT DONE" | is.na(VSSTAT),
constant_by_vars = exprs(USUBJID)
)
advs <- derive_param_bmi(
advs,
by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars, VISIT, VISITNUM, ADT, ADY, VSTPT, VSTPTNUM),
set_values_to = exprs(PARAMCD = "BMI"),
get_unit_expr = VSSTRESU,
filter = VSSTAT != "NOT DONE" | is.na(VSSTAT),
constant_by_vars = exprs(USUBJID)
)
Similarly, for ADEG
, the parameters QTCBF
QTCBS
and QTCL
can be created with a function
call. See example below for PARAMCD
=
QTCF
.
adeg <- tibble::tribble(
~USUBJID, ~EGSTRESU, ~PARAMCD, ~AVAL, ~VISIT,
"P01", "msec", "QT", 350, "CYCLE 1 DAY 1",
"P01", "msec", "QT", 370, "CYCLE 2 DAY 1",
"P01", "msec", "RR", 842, "CYCLE 1 DAY 1",
"P01", "msec", "RR", 710, "CYCLE 2 DAY 1"
)
adeg <- derive_param_qtc(
adeg,
by_vars = exprs(USUBJID, VISIT),
method = "Fridericia",
set_values_to = exprs(PARAMCD = "QTCFR"),
get_unit_expr = EGSTRESU
)
Similarly, for ADLB
, the function
derive_param_wbc_abs()
can be used to create new parameter
for lab differentials converted to absolute values. See example
below:
adlb <- tibble::tribble(
~USUBJID, ~PARAMCD, ~AVAL, ~PARAM, ~VISIT,
"P01", "WBC", 33, "Leukocyte Count (10^9/L)", "CYCLE 1 DAY 1",
"P01", "WBC", 38, "Leukocyte Count (10^9/L)", "CYCLE 2 DAY 1",
"P01", "LYMLE", 0.90, "Lymphocytes (fraction of 1)", "CYCLE 1 DAY 1",
"P01", "LYMLE", 0.70, "Lymphocytes (fraction of 1)", "CYCLE 2 DAY 1"
)
derive_param_wbc_abs(
dataset = adlb,
by_vars = exprs(USUBJID, VISIT),
set_values_to = exprs(
PARAMCD = "LYMPH",
PARAM = "Lymphocytes Abs (10^9/L)",
DTYPE = "CALCULATION"
),
get_unit_expr = extract_unit(PARAM),
wbc_code = "WBC",
diff_code = "LYMLE",
diff_type = "fraction"
)
When all PARAMCD
have been derived and added to the
dataset, the other information from the look-up table
(PARAM
, PARAMCAT1
,…) should be added.
# Derive PARAM and PARAMN
advs <- derive_vars_merged(
advs,
dataset_add = select(param_lookup, -VSTESTCD),
by_vars = exprs(PARAMCD)
)
Derive Timing Variables (e.g. APHASE
,
AVISIT
, APERIOD
)
Categorical timing variables are protocol and analysis dependent. Below is a simple example.
advs <- advs %>%
mutate(
AVISIT = case_when(
str_detect(VISIT, "SCREEN") ~ NA_character_,
str_detect(VISIT, "UNSCHED") ~ NA_character_,
str_detect(VISIT, "RETRIEVAL") ~ NA_character_,
str_detect(VISIT, "AMBUL") ~ NA_character_,
!is.na(VISIT) ~ str_to_title(VISIT)
),
AVISITN = as.numeric(case_when(
VISIT == "BASELINE" ~ "0",
str_detect(VISIT, "WEEK") ~ str_trim(str_replace(VISIT, "WEEK", ""))
)),
ATPT = VSTPT,
ATPTN = VSTPTNUM
)
count(advs, VISITNUM, VISIT, AVISITN, AVISIT)
#> # A tibble: 15 × 5
#> VISITNUM VISIT AVISITN AVISIT n
#> <dbl> <chr> <dbl> <chr> <int>
#> 1 1 SCREENING 1 NA NA 102
#> 2 2 SCREENING 2 NA NA 78
#> 3 3 BASELINE 0 Baseline 96
#> 4 3.5 AMBUL ECG PLACEMENT NA NA 65
#> 5 4 WEEK 2 2 Week 2 96
#> 6 5 WEEK 4 4 Week 4 80
#> 7 6 AMBUL ECG REMOVAL NA NA 52
#> 8 7 WEEK 6 6 Week 6 48
#> 9 8 WEEK 8 8 Week 8 48
#> 10 9 WEEK 12 12 Week 12 48
#> 11 10 WEEK 16 16 Week 16 48
#> 12 11 WEEK 20 20 Week 20 32
#> 13 12 WEEK 24 24 Week 24 32
#> 14 13 WEEK 26 26 Week 26 32
#> 15 201 RETRIEVAL NA NA 26
count(advs, VSTPTNUM, VSTPT, ATPTN, ATPT)
#> # A tibble: 4 × 5
#> VSTPTNUM VSTPT ATPTN ATPT n
#> <dbl> <chr> <dbl> <chr> <int>
#> 1 815 AFTER LYING DOWN FOR 5 MINUTES 815 AFTER LYING DOWN FOR 5 MI… 232
#> 2 816 AFTER STANDING FOR 1 MINUTE 816 AFTER STANDING FOR 1 MINU… 232
#> 3 817 AFTER STANDING FOR 3 MINUTES 817 AFTER STANDING FOR 3 MINU… 232
#> 4 NA NA NA NA 187
For assigning visits based on time windows and deriving periods, subperiods, and phase variables see the “Visit and Period Variables” vignette.
Timing Flag Variables (e.g. ONTRTFL
)
In some analyses, it may be necessary to flag an observation as
on-treatment. The admiral function derive_var_ontrtfl()
can
be used.
For example, if on-treatment is defined as any observation between treatment start and treatment end, the flag may be derived as:
advs <- derive_var_ontrtfl(
advs,
start_date = ADT,
ref_start_date = TRTSDT,
ref_end_date = TRTEDT
)
This function returns the original data frame with the column
ONTRTFL
added. Additionally, this function does have
functionality to handle a window on the ref_end_date
. For
example, if on-treatment is defined as between treatment start and
treatment end plus 60 days, the call would be:
advs <- derive_var_ontrtfl(
advs,
start_date = ADT,
ref_start_date = TRTSDT,
ref_end_date = TRTEDT,
ref_end_window = 60
)
In addition, the function does allow you to filter out pre-treatment
observations that occurred on the start date. For example, if
observations with VSTPT == PRE
should not be considered
on-treatment when the observation date falls between the treatment start
and end date, the user may specify this using the
filter_pre_timepoint
parameter:
advs <- derive_var_ontrtfl(
advs,
start_date = ADT,
ref_start_date = TRTSDT,
ref_end_date = TRTEDT,
filter_pre_timepoint = ATPT == "AFTER LYING DOWN FOR 5 MINUTES"
)
Lastly, the function does allow you to create any on-treatment flag
based on the analysis needs. For example, if variable
ONTR01FL
is needed, showing the on-treatment flag during
Period 01, you need to set new var = ONTR01FL
. In addition,
for Period 01 Start Date and Period 01 End Date, you need
ref_start_date = AP01SDT
and
ref_end_date = AP01EDT
.
advs <- derive_var_ontrtfl(
advs,
new_var = ONTR01FL,
start_date = ASTDT,
end_date = AENDT,
ref_start_date = AP01SDT,
ref_end_date = AP01EDT,
span_period = TRUE
)
Assign Reference Range Indicator (ANRIND
)
The admiral function derive_var_anrind()
may be used to
derive the reference range indicator ANRIND
.
This function requires the reference range boundaries to exist on the
data frame (ANRLO
, ANRHI
) and also
accommodates the additional boundaries A1LO
and
A1HI
.
The function is called as:
advs <- derive_var_anrind(advs)
Derive Baseline (BASETYPE
, ABLFL
,
BASE
, BNRIND
)
The BASETYPE
should be derived using the function
derive_basetype_records()
. The parameter
basetypes
of this function requires a named list of
expression detailing how the BASETYPE
should be assigned.
Note, if a record falls into multiple expressions within the basetypes
expression, a row will be produced for each BASETYPE
.
advs <- derive_basetype_records(
dataset = advs,
basetypes = exprs(
"LAST: AFTER LYING DOWN FOR 5 MINUTES" = ATPTN == 815,
"LAST: AFTER STANDING FOR 1 MINUTE" = ATPTN == 816,
"LAST: AFTER STANDING FOR 3 MINUTES" = ATPTN == 817,
"LAST" = is.na(ATPTN)
)
)
count(advs, ATPT, ATPTN, BASETYPE)
#> # A tibble: 4 × 4
#> ATPT ATPTN BASETYPE n
#> <chr> <dbl> <chr> <int>
#> 1 AFTER LYING DOWN FOR 5 MINUTES 815 LAST: AFTER LYING DOWN FOR 5 MINUT… 232
#> 2 AFTER STANDING FOR 1 MINUTE 816 LAST: AFTER STANDING FOR 1 MINUTE 232
#> 3 AFTER STANDING FOR 3 MINUTES 817 LAST: AFTER STANDING FOR 3 MINUTES 232
#> 4 NA NA LAST 187
It is important to derive BASETYPE
first so that it can
be utilized in subsequent derivations. This will be important if the
data frame contains multiple values for BASETYPE
.
Next, the analysis baseline flag ABLFL
can be derived
using the admiral function
derive_var_extreme_flag()
. For example, if baseline is
defined as the last non-missing AVAL
prior or on
TRTSDT
, the function call for ABLFL
would
be:
advs <- restrict_derivation(
advs,
derivation = derive_var_extreme_flag,
args = params(
by_vars = exprs(STUDYID, USUBJID, BASETYPE, PARAMCD),
order = exprs(ADT, ATPTN, VISITNUM),
new_var = ABLFL,
mode = "last"
),
filter = (!is.na(AVAL) & ADT <= TRTSDT & !is.na(BASETYPE))
)
Note: Additional examples of the
derive_var_extreme_flag()
function can be found above.
Lastly, the BASE
, and BNRIND
columns can be
derived using the admiral function
derive_var_base()
. Example calls are:
advs <- derive_var_base(
advs,
by_vars = exprs(STUDYID, USUBJID, PARAMCD, BASETYPE),
source_var = AVAL,
new_var = BASE
)
advs <- derive_var_base(
advs,
by_vars = exprs(STUDYID, USUBJID, PARAMCD, BASETYPE),
source_var = ANRIND,
new_var = BNRIND
)
Derive Change from Baseline (CHG
,
PCHG
)
Change and percent change from baseline can be derived using the
admiral functions derive_var_chg()
and
derive_var_pchg()
. These functions expect AVAL
and BASE
to exist in the data frame. The CHG
is simply AVAL - BASE
and the PCHG
is
(AVAL - BASE) / absolute value (BASE) * 100
. Examples calls
are:
advs <- derive_var_chg(advs)
advs <- derive_var_pchg(advs)
If the variables should not be derived for all records, e.g., for
post-baseline records only, restrict_derivation()
can be
used.
Derive Shift (e.g. SHIFT1
)
Shift variables can be derived using the admiral
function derive_var_shift()
. This function derives a
character shift variable concatenating shift in values based on a
user-defined pairing, e.g., shift from baseline reference range
BNRIND
to analysis reference range ANRIND
.
Examples calls are:
advs <- derive_var_shift(advs,
new_var = SHIFT1,
from_var = BNRIND,
to_var = ANRIND
)
If the variables should not be derived for all records, e.g., for
post-baseline records only, restrict_derivation()
can be
used.
Derive Analysis Ratio (R2BASE
)
Analysis ratio variables can be derived using the
admiral function
derive_var_analysis_ratio()
. This function derives a ratio
variable based on user-specified pair. For example, Ratio to Baseline is
calculated by AVAL / BASE
and the function appends a new
variable R2BASE
to the dataset. Examples calls are:
advs <- derive_var_analysis_ratio(advs,
numer_var = AVAL,
denom_var = BASE
)
advs <- derive_var_analysis_ratio(advs,
numer_var = AVAL,
denom_var = ANRLO,
new_var = R01ANRLO
)
If the variables should not be derived for all records, e.g., for
post-baseline records only, restrict_derivation()
can be
used.
Derive Analysis Flags (e.g. ANL01FL
)
In most finding ADaMs, an analysis flag is derived to identify the appropriate observation(s) to use for a particular analysis when a subject has multiple observations within a particular timing period.
In this situation, an analysis flag (e.g. ANLxxFL
) may
be used to choose the appropriate record for analysis.
This flag may be derived using the admiral function
derive_var_extreme_flag()
. For this example, we will assume
we would like to choose the latest and highest value by
USUBJID
, PARAMCD
, AVISIT
, and
ATPT
.
advs <- restrict_derivation(
advs,
derivation = derive_var_extreme_flag,
args = params(
by_vars = exprs(STUDYID, USUBJID, BASETYPE, PARAMCD, AVISIT),
order = exprs(ADT, ATPTN, AVAL),
new_var = ANL01FL,
mode = "last"
),
filter = !is.na(AVISITN)
)
Another common example would be flagging the worst value for a
subject, parameter, and visit. For this example, we will assume we have
3 PARAMCD
values (SYSBP
, DIABP
,
and RESP
). We will also assume high is worst for
SYSBP
and DIABP
and low is worst for
RESP
.
advs <- slice_derivation(
advs,
derivation = derive_var_extreme_flag,
args = params(
by_vars = exprs(STUDYID, USUBJID, BASETYPE, PARAMCD, AVISIT),
order = exprs(ADT, ATPTN),
new_var = WORSTFL,
mode = "first"
),
derivation_slice(
filter = PARAMCD %in% c("SYSBP", "DIABP") & (!is.na(AVISIT) & !is.na(AVAL))
),
derivation_slice(
filter = PARAMCD %in% "PULSE" & (!is.na(AVISIT) & !is.na(AVAL)),
args = params(mode = "last")
)
) %>%
arrange(STUDYID, USUBJID, BASETYPE, PARAMCD, AVISIT)
Assign Treatment (TRTA
, TRTP
)
TRTA
and TRTP
must match at least one value
of the character treatment variables in ADSL (e.g.,
TRTxxA
/TRTxxP
,
TRTSEQA
/TRTSEQP
,
TRxxAGy
/TRxxPGy
).
An example of a simple implementation for a study without periods could be:
advs <- mutate(advs, TRTP = TRT01P, TRTA = TRT01A)
count(advs, TRTP, TRTA, TRT01P, TRT01A)
#> # A tibble: 2 × 5
#> TRTP TRTA TRT01P TRT01A n
#> <chr> <chr> <chr> <chr> <int>
#> 1 Placebo Placebo Placebo Placebo 640
#> 2 Xanomeline Low Dose Xanomeline Low Dose Xanomeline Low Dose Xanomeline … 243
For studies with periods see the “Visit and Period Variables” vignette.
Assign ASEQ
The admiral function
derive_var_obs_number()
can be used to derive
ASEQ
. An example call is:
advs <- derive_var_obs_number(
advs,
new_var = ASEQ,
by_vars = exprs(STUDYID, USUBJID),
order = exprs(PARAMCD, ADT, AVISITN, VISITNUM, ATPTN),
check_type = "error"
)