Derive a Time-to-Event Parameter

Add a time-to-event parameter to the input dataset.

Usage

derive_param_tte(
  dataset = NULL,
  dataset_adsl,
  source_datasets,
  by_vars = NULL,
  start_date = TRTSDT,
  end_dates = NULL,
  event_conditions,
  censor_conditions = NULL,
  event_type = "negative",
  create_datetime = FALSE,
  set_values_to,
  subject_keys = get_admiral_option("subject_keys"),
  check_type = "warning"
)

Arguments

dataset

Input dataset

PARAMCD is expected.

Permitted values: a dataset, i.e., a data.frame or tibble
Default value: NULL

dataset_adsl

ADSL input dataset

The variables specified for start_date, and subject_keys are expected.

Permitted values: a dataset, i.e., a data.frame or tibble
Default value: none

source_datasets

Source datasets

A named list of datasets is expected. The dataset_name field of tte_source() refers to the dataset provided in the list.

Permitted values: named list of datasets, e.g., list(adsl = adsl, ae = ae)
Default value: none

by_vars

By variables

If the parameter is specified, separate time to event parameters are derived for each by group.

The by variables must be in at least one of the source datasets. Each source dataset must contain either all by variables or none of the by variables.

The by variables are not included in the output dataset.

Permitted values: list of variables created by exprs(), e.g., exprs(USUBJID, VISIT)
Default value: NULL

start_date

Time to event origin date

The variable STARTDT is set to the specified date. The value is taken from the ADSL dataset.

If the event or censoring date is before the origin date, ADT is set to the origin date.

Permitted values: a date or datetime variable
Default value: TRTSDT

end_dates

Time to event end date(s)

A list of censor_source() objects is expected. Each date restricts the observation period for time-to-event analysis. For each subject the earliest date across all end_dates is used as the end date for that subject. The records defined by event_conditions and censor_conditions are restricted to dates before or equal to the selected end date.

This argument should be specified if there is more than one reason for stopping the observation of a subject, e.g., end of study, death, or intercurrent events like start of new drug. If there is only one reason for stopping the observation, it is sufficient to just include this as a censoring condition in censor_conditions.

See the Differentiating censoring reasons and Differentiating censoring date description examples to see how the values (set_value_to) set by end dates interact with the values set by the censoring conditions.

Permitted values: a list of source objects, e.g., list(pd, death)
Default value: NULL

event_conditions

Sources and conditions defining events

A list of event_source() objects is expected.

Permitted values: a list of source objects, e.g., list(pd, death)
Default value: none

censor_conditions

Sources and conditions defining censorings

A list of censor_source() objects is expected. Each record defined by the censor_conditions should be a possible censoring time, i.e., at this time it should be known that the event of interest has not yet occurred. Assessment where it is not known whether the event of interest has occurred or not should not be included as censoring times, e.g., when an assessment was not evaluable.

It is acceptable to include records with event as long as these are also included in event_conditions because events take precedence over censorings.

Permitted values: a list of source objects, e.g., list(pd, death)
Default value: NULL

event_type

Type of event

For events that are considered unfavorable, e.g., adverse events, progression, worsening, etc., the value should be "negative" and for events that are considered favorable, e.g., response to treatment, improvement, etc., the value should be "positive".

If event_type is specified as "positive", the objects specified for end_dates are added to the censoring conditions (censor_conditions). I.e., if a subject is censored, it is censored at the earliest date provided by end_dates.

Permitted values: "negative", "positive"
Default value: "negative"

create_datetime

Create datetime variables?

If set to TRUE, variables ADTM and STARTDTM are created. Otherwise, variables ADT and STARTDT are created.

Permitted values: TRUE, FALSE
Default value: FALSE

set_values_to

Variables to set

A named list returned by exprs() defining the variables to be set for the new parameter, e.g. exprs(PARAMCD = "OS", PARAM = "Overall Survival") is expected. The values must be symbols, character strings, numeric values, expressions, or NA.

Permitted values: list of named expressions created by exprs(), e.g., exprs(AVALC = VSSTRESC, AVAL = yn_to_numeric(AVALC))
Default value: none

subject_keys

Variables to uniquely identify a subject

A list of symbols created using exprs() is expected.

Permitted values: list of variables created by exprs(), e.g., exprs(USUBJID, VISIT)
Default value: get_admiral_option("subject_keys")

check_type

Check uniqueness

If "warning", "message", or "error" is specified, the specified message is issued if the observations of the source datasets are not unique with respect to the by variables and the date and order specified in the event_source() and censor_source() objects.

Permitted values: "none", "message", "warning", "error"
Default value: "warning"

Value

The input dataset with the new parameter added

Details

The following steps are performed to create the observations of the new parameter:

Deriving the events:

For each event source dataset the observations as specified by the filter element are selected. If the end_dates argument is specified, records after the first of the end dates are excluded. Then for each subject the first observation (with respect to date and order) is selected.
The ADT variable is set to the variable specified by the date element. If the date variable is a datetime variable, only the datepart is copied.
The CNSR variable is added and set to the censor element.
The variables specified by the set_values_to element are added.
The selected observations of all event source datasets are combined into a single dataset.
For each subject the first observation (with respect to the ADT/ADTM variable) from the single dataset is selected. If there is more than one event with the same date, the first event with respect to the order of events in event_conditions is selected.

Deriving the censoring observations:

For each censoring source dataset:
1. The observations as specified by the filter element are selected.
2. If the end_dates argument is specified, records after the first of the end dates are excluded and the variables defined by the set_values_to element of the first end date are added.
3. If event_type = "positive" and the end_dates argument is specified, new records with the first end date and the variables defined by the set_values_to element are added. (These will be selected in the next step, i.e., for positive events the first end date is used as censoring date.)
4. Then for each subject the last observation (with respect to date and order) is selected.
The ADT variable is set to the variable specified by the date element. If the date variable is a datetime variable, only the datepart is copied.
The CNSR variable is added. If the end_dates argument is specified and the consider_end_dates element is TRUE, it is set to the censor value of the first of the end dates. Otherwise, it set to the censor element.
The variables specified by the set_values_to element are added.
The selected observations of all censoring source datasets are combined into a single dataset.
For each subject the last observation (with respect to the ADT/ADTM variable) from the single dataset is selected. If there is more than one censoring with the same date, the last censoring with respect to the order of censorings in censor_conditions is selected.

For each subject (as defined by the subject_keys parameter) an observation is selected. If an event is available, the event observation is selected. Otherwise the censoring observation is selected.

Finally:

The variable specified for start_date is joined from the ADSL dataset. Only subjects in both datasets are kept, i.e., subjects with both an event or censoring and an observation in dataset_adsl.
The variables as defined by the set_values_to parameter are added.
The ADT/ADTM variable is set to the maximum of ADT/ADTM and STARTDT/STARTDTM (depending on the create_datetime parameter).
The new observations are added to the output dataset.

Examples

Add a basic time to event parameter

For each subject the time to first adverse event should be created as a parameter.

The event source object is created using event_source() and the date is set to adverse event start date.
The censor source object is created using censor_source() and the date is set to end of study date.
The event and censor source objects are then passed to derive_param_tte() to derive the time to event parameter with the provided parameter descriptions (PARAMCD and PARAM).
Note the values of the censor variable (CNSR) that are derived below, where the first subject has an event and the second does not.

library(tibble)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)

adsl <- tribble(
  ~USUBJID, ~TRTSDT,           ~EOSDT,            ~NEWDRGDT,
  "01",     ymd("2020-12-06"), ymd("2021-03-06"), NA,
  "02",     ymd("2021-01-16"), ymd("2021-02-03"), ymd("2021-01-03")
) %>%
  mutate(STUDYID = "AB42")

adae <- tribble(
  ~USUBJID, ~ASTDT,            ~AESEQ, ~AEDECOD,
  "01",     ymd("2021-01-03"),      1, "Flu",
  "01",     ymd("2021-03-04"),      2, "Cough",
  "01",     ymd("2021-03-05"),      3, "Cough"
) %>%
  mutate(STUDYID = "AB42")

ttae <- event_source(
  dataset_name = "adae",
  date = ASTDT,
  set_values_to = exprs(
    EVNTDESC = "AE",
    SRCDOM = "ADAE",
    SRCVAR = "ASTDT",
    SRCSEQ = AESEQ
  )
)

eos <- censor_source(
  dataset_name = "adsl",
  date = EOSDT,
  set_values_to = exprs(
    EVNTDESC = "END OF STUDY",
    SRCDOM = "ADSL",
    SRCVAR = "EOSDT"
  )
)

derive_param_tte(
  dataset_adsl = adsl,
  event_conditions = list(ttae),
  censor_conditions = list(eos),
  source_datasets = list(adsl = adsl, adae = adae),
  set_values_to = exprs(
    PARAMCD = "TTAE",
    PARAM = "Time to First Adverse Event"
  )
) %>%
  select(USUBJID, STARTDT, PARAMCD, PARAM, ADT, CNSR, SRCSEQ)
#> # A tibble: 2 × 7
#>   USUBJID STARTDT    PARAMCD PARAM                       ADT         CNSR SRCSEQ
#>   <chr>   <date>     <chr>   <chr>                       <date>     <int>  <dbl>
#> 1 01      2020-12-06 TTAE    Time to First Adverse Event 2021-01-03     0      1
#> 2 02      2021-01-16 TTAE    Time to First Adverse Event 2021-02-03     1     NA

Adding a by variable (`by_vars`)

By variables can be added using the by_vars argument, e.g., now for each subject the time to first occurrence of each adverse event preferred term (AEDECOD) should be created as parameters.

Please note that CDISC requires separate parameters (PARAMCD, PARAM) for the by groups. Therefore the variables specified for the by_vars parameter are not included in the output dataset. The PARAMCD variable should be specified for the set_value_to parameter using an expression on the right hand side which results in a unique value for each by group. If the values of the by variables should be included in the output dataset, they can be stored in PARCATy variables.

derive_param_tte(
  dataset_adsl = adsl,
  by_vars = exprs(AEDECOD),
  event_conditions = list(ttae),
  censor_conditions = list(eos),
  source_datasets = list(adsl = adsl, adae = adae),
  set_values_to = exprs(
    PARAMCD = paste0("TTAE", as.numeric(as.factor(AEDECOD))),
    PARAM = paste("Time to First", AEDECOD, "Adverse Event")
  )
) %>%
  select(USUBJID, STARTDT, PARAMCD, PARAM, ADT, CNSR, SRCSEQ)
#> # A tibble: 4 × 7
#>   USUBJID STARTDT    PARAMCD PARAM                       ADT         CNSR SRCSEQ
#>   <chr>   <date>     <chr>   <chr>                       <date>     <int>  <dbl>
#> 1 01      2020-12-06 TTAE1   Time to First Cough Advers… 2021-03-04     0      2
#> 2 01      2020-12-06 TTAE2   Time to First Flu Adverse … 2021-01-03     0      1
#> 3 02      2021-01-16 TTAE1   Time to First Cough Advers… 2021-02-03     1     NA
#> 4 02      2021-01-16 TTAE2   Time to First Flu Adverse … 2021-02-03     1     NA

Handling duplicates (`check_type`)

The source records are checked regarding duplicates with respect to the by variables and the date and order specified in the source objects. By default, a warning is issued if any duplicates are found. Note here how after creating a new adverse event dataset containing a duplicate date for "Cough", it was then passed to the function using the source_datasets argument - where you see below adae = adae_dup.

adae_dup <- tribble(
  ~USUBJID, ~ASTDT,            ~AESEQ, ~AEDECOD, ~AESER,
  "01",     ymd("2021-01-03"),      1, "Flu",    "Y",
  "01",     ymd("2021-03-04"),      2, "Cough",  "N",
  "01",     ymd("2021-03-04"),      3, "Cough",  "Y"
) %>%
  mutate(STUDYID = "AB42")

derive_param_tte(
  dataset_adsl = adsl,
  by_vars = exprs(AEDECOD),
  start_date = TRTSDT,
  source_datasets = list(adsl = adsl, adae = adae_dup),
  event_conditions = list(ttae),
  censor_conditions = list(eos),
  set_values_to = exprs(
    PARAMCD = paste0("TTAE", as.numeric(as.factor(AEDECOD))),
    PARAM = paste("Time to First", AEDECOD, "Adverse Event")
  )
)
#> # A tibble: 4 × 11
#>   USUBJID STUDYID EVNTDESC     SRCDOM SRCVAR SRCSEQ  CNSR ADT        STARTDT
#>   <chr>   <chr>   <chr>        <chr>  <chr>   <dbl> <int> <date>     <date>
#> 1 01      AB42    AE           ADAE   ASTDT       2     0 2021-03-04 2020-12-06
#> 2 01      AB42    AE           ADAE   ASTDT       1     0 2021-01-03 2020-12-06
#> 3 02      AB42    END OF STUDY ADSL   EOSDT      NA     1 2021-02-03 2021-01-16
#> 4 02      AB42    END OF STUDY ADSL   EOSDT      NA     1 2021-02-03 2021-01-16
#> # i 2 more variables: PARAMCD <chr>, PARAM <chr>
#> Warning: Dataset "adae" contains duplicate records with respect to `STUDYID`, `USUBJID`,
#> `AEDECOD`, and `ASTDT`
#> i Run `admiral::get_duplicates_dataset()` to access the duplicate records

For investigating the issue, the dataset of the duplicate source records can be obtained by calling get_duplicates_dataset():

get_duplicates_dataset()
#> Duplicate records with respect to `STUDYID`, `USUBJID`, `AEDECOD`, and `ASTDT`.
#> # A tibble: 2 × 6
#>   STUDYID USUBJID AEDECOD ASTDT      AESEQ AESER
#> * <chr>   <chr>   <chr>   <date>     <dbl> <chr>
#> 1 AB42    01      Cough   2021-03-04     2 N
#> 2 AB42    01      Cough   2021-03-04     3 Y

Common options to solve the issue:

Restricting the source records by specifying/updating the filter argument in the event_source()/censor_source() calls.
Specifying additional variables for order in the event_source()/censor_source() calls.
Setting check_type = "none" in the derive_param_tte() call to ignore any duplicates.

In this example it does not have significant impact which record is chosen as the dates are the same so the time to event derivation will be the same, but it does impact SRCSEQ in the output dataset, so here the second option is used. Note here how you can also define source objects from within the derive_param_tte() function call itself.

derive_param_tte(
  dataset_adsl = adsl,
  by_vars = exprs(AEDECOD),
  start_date = TRTSDT,
  source_datasets = list(adsl = adsl, adae = adae_dup),
  event_conditions = list(event_source(
    dataset_name = "adae",
    date = ASTDT,
    set_values_to = exprs(
      EVNTDESC = "AE",
      SRCDOM = "ADAE",
      SRCVAR = "ASTDT",
      SRCSEQ = AESEQ
    ),
    order = exprs(AESEQ)
  )),
  censor_conditions = list(eos),
  set_values_to = exprs(
    PARAMCD = paste0("TTAE", as.numeric(as.factor(AEDECOD))),
    PARAM = paste("Time to First", AEDECOD, "Adverse Event")
  )
) %>%
  select(USUBJID, STARTDT, PARAMCD, PARAM, ADT, CNSR, SRCSEQ)
#> # A tibble: 4 × 7
#>   USUBJID STARTDT    PARAMCD PARAM                       ADT         CNSR SRCSEQ
#>   <chr>   <date>     <chr>   <chr>                       <date>     <int>  <dbl>
#> 1 01      2020-12-06 TTAE1   Time to First Cough Advers… 2021-03-04     0      2
#> 2 01      2020-12-06 TTAE2   Time to First Flu Adverse … 2021-01-03     0      1
#> 3 02      2021-01-16 TTAE1   Time to First Cough Advers… 2021-02-03     1     NA
#> 4 02      2021-01-16 TTAE2   Time to First Flu Adverse … 2021-02-03     1     NA

Filtering source records (`filter`)

The first option from above could have been achieved using filter, for example here only using serious adverse events.

derive_param_tte(
  dataset_adsl = adsl,
  by_vars = exprs(AEDECOD),
  start_date = TRTSDT,
  source_datasets = list(adsl = adsl, adae = adae_dup),
  event_conditions = list(event_source(
    dataset_name = "adae",
    filter = AESER == "Y",
    date = ASTDT,
    set_values_to = exprs(
      EVNTDESC = "Serious AE",
      SRCDOM = "ADAE",
      SRCVAR = "ASTDT",
      SRCSEQ = AESEQ
    )
  )),
  censor_conditions = list(eos),
  set_values_to = exprs(
    PARAMCD = paste0("TTSAE", as.numeric(as.factor(AEDECOD))),
    PARAM = paste("Time to First Serious", AEDECOD, "Adverse Event")
  )
) %>%
  select(USUBJID, STARTDT, PARAMCD, PARAM, ADT, CNSR, SRCSEQ)
#> # A tibble: 4 × 7
#>   USUBJID STARTDT    PARAMCD PARAM                       ADT         CNSR SRCSEQ
#>   <chr>   <date>     <chr>   <chr>                       <date>     <int>  <dbl>
#> 1 01      2020-12-06 TTSAE1  Time to First Serious Coug… 2021-03-04     0      3
#> 2 01      2020-12-06 TTSAE2  Time to First Serious Flu … 2021-01-03     0      1
#> 3 02      2021-01-16 TTSAE1  Time to First Serious Coug… 2021-02-03     1     NA
#> 4 02      2021-01-16 TTSAE2  Time to First Serious Flu … 2021-02-03     1     NA

Using multiple event/censor conditions (`event_conditions` /`censor_conditions`)

In the above examples, we only have a single event and single censor condition. Here, we now consider multiple conditions for each passed using event_conditions and censor_conditions.

For the event we are going to use first AE and additionally check a lab condition, and for the censor we'll add in treatment start date in case end of study date was ever missing.

adlb <- tribble(
  ~USUBJID, ~ADT,              ~PARAMCD, ~ANRIND,
  "01",     ymd("2020-12-22"), "HGB",    "LOW"
) %>%
  mutate(STUDYID = "AB42")

low_hgb <- event_source(
  dataset_name = "adlb",
  filter = PARAMCD == "HGB" & ANRIND == "LOW",
  date = ADT,
  set_values_to = exprs(
    EVNTDESC = "POSSIBLE ANEMIA",
    SRCDOM = "ADLB",
    SRCVAR = "ADT"
  )
)

trt_start <- censor_source(
  dataset_name = "adsl",
  date = TRTSDT,
  set_values_to = exprs(
    EVNTDESC = "TREATMENT START",
    SRCDOM = "ADSL",
    SRCVAR = "TRTSDT"
  )
)

derive_param_tte(
  dataset_adsl = adsl,
  event_conditions = list(ttae, low_hgb),
  censor_conditions = list(eos, trt_start),
  source_datasets = list(adsl = adsl, adae = adae, adlb = adlb),
  set_values_to = exprs(
    PARAMCD = "TTAELB",
    PARAM = "Time to First Adverse Event or Possible Anemia (Labs)"
  )
) %>%
  select(USUBJID, STARTDT, PARAMCD, PARAM, ADT, CNSR, SRCSEQ)
#> # A tibble: 2 × 7
#>   USUBJID STARTDT    PARAMCD PARAM                       ADT         CNSR SRCSEQ
#>   <chr>   <date>     <chr>   <chr>                       <date>     <int>  <dbl>
#> 1 01      2020-12-06 TTAELB  Time to First Adverse Even… 2020-12-22     0     NA
#> 2 02      2021-01-16 TTAELB  Time to First Adverse Even… 2021-02-03     1     NA

Note above how the earliest event date is always taken and the latest censor date.

End of the observation period (`end_dates`)

The end_dates argument can be used to specify the end of the observation period if there is more than one date which restricts the observation period, e.g., end of study date and intercurrent events like new drug date. The earliest date is used as the end of the observation period and events/censorings occurring after this date are not considered.

In the example two censor_source() objects are defined, eos and newdrg, for end of study date and new drug date, respectively, and then passed to the end_dates argument.

adsl <- tribble(
  ~USUBJID, ~TRTSDT,           ~EOSDT,            ~NEWDRGDT,
  "01",     ymd("2020-12-06"), ymd("2021-03-06"), NA,
  "02",     ymd("2021-01-16"), ymd("2021-04-03"), ymd("2021-03-21"),
  "03",     ymd("2021-02-01"), NA,                NA,
  "04",     ymd("2021-03-10"), NA,                NA
) %>%
  mutate(STUDYID = "AB42")

adqs <- tribble(
  ~USUBJID, ~ADT,              ~CHG,
  "01",     ymd("2021-01-03"),    5,
  "01",     ymd("2021-02-03"),   -2,
  "01",     ymd("2021-03-01"),   NA,
  "01",     ymd("2021-03-07"),   10,
  "02",     ymd("2021-01-03"),    4,
  "02",     ymd("2021-02-03"),   -1,
  "02",     ymd("2021-04-01"),  -12,
  "03",     ymd("2021-02-15"),    3,
  "03",     ymd("2021-03-15"),  -15
) %>%
  mutate(STUDYID = "AB42")

eos <- censor_source(
  dataset_name = "adsl",
  date = EOSDT
)

newdrg <- censor_source(
  dataset_name = "adsl",
  date = NEWDRGDT
)

# Note to user: The source function has changed.
worsening <- event_source(
  dataset_name = "adqs",
  date = ADT,
  filter = CHG <= -10
)

valid_assessment <- censor_source(
  dataset_name = "adqs",
  date = ADT,
  filter = !is.na(CHG)
)

no_assessment <- censor_source(
  dataset_name = "adsl",
  date = TRTSDT
)

derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl, adqs = adqs),
  start_date = TRTSDT,
  end_dates = list(eos, newdrg),
  event_conditions = list(worsening),
  censor_conditions = list(valid_assessment, no_assessment),
  set_values_to = exprs(PARAMCD = "TTWORSE")
) %>%
  select(-STUDYID, -PARAMCD) %>%
  derive_vars_merged(
    dataset_add = adsl,
    by_vars = exprs(USUBJID),
    new_vars = exprs(EOSDT, NEWDRGDT)
  )
#> # A tibble: 4 × 6
#>   USUBJID ADT         CNSR STARTDT    EOSDT      NEWDRGDT
#>   <chr>   <date>     <int> <date>     <date>     <date>
#> 1 01      2021-02-03     1 2020-12-06 2021-03-06 NA
#> 2 02      2021-02-03     1 2021-01-16 2021-04-03 2021-03-21
#> 3 03      2021-03-15     0 2021-02-01 NA         NA
#> 4 04      2021-03-10     1 2021-03-10 NA         NA

Please note that:

subject 02 has no event because the assessment with CHG = -12 was excluded as it is after the start of a new drug.
subject 01 and 02 are censored at the last valid assessment before the end of the observation period.

Positive event (`event_type`)

If positive events like response or improvement are analyzed, event_type = "positive" should be used. Subjects without events are censored at the end of the observation period (defined by end_dates) instead of the last assessment. For positive events this is the more conservative approach.

adsl <- tribble(
  ~USUBJID, ~TRTSDT,           ~EOSDT,            ~NEWDRGDT,
  "01",     ymd("2020-12-06"), ymd("2021-03-06"), NA,
  "02",     ymd("2021-01-16"), ymd("2021-04-03"), ymd("2021-03-21"),
  "03",     ymd("2021-02-01"), NA,                NA
) %>%
  mutate(STUDYID = "AB42")

adqs <- tribble(
  ~USUBJID, ~ADT,              ~CHG,
  "01",     ymd("2021-01-03"),    5,
  "01",     ymd("2021-02-03"),   -2,
  "01",     ymd("2021-03-01"),   NA,
  "01",     ymd("2021-03-07"),   10,
  "02",     ymd("2021-01-03"),    4,
  "02",     ymd("2021-02-03"),   -1,
  "02",     ymd("2021-04-01"),  -12,
  "03",     ymd("2021-02-15"),    3,
  "03",     ymd("2021-03-15"),   15
) %>%
  mutate(STUDYID = "AB42")

eos <- censor_source(
  dataset_name = "adsl",
  date = EOSDT
)

newdrg <- censor_source(
  dataset_name = "adsl",
  date = NEWDRGDT
)

improvement <- event_source(
  dataset_name = "adqs",
  date = ADT,
  filter = CHG >= 10
)

valid_assessment <- censor_source(
  dataset_name = "adqs",
  date = ADT,
  filter = !is.na(CHG)
)

derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl, adqs = adqs),
  start_date = TRTSDT,
  end_dates = list(eos, newdrg),
  event_conditions = list(improvement),
  censor_conditions = list(valid_assessment),
  event_type = "positive",
  set_values_to = exprs(PARAMCD = "TTIMPROV")
) %>%
  select(-STUDYID) %>%
  derive_vars_merged(
    dataset_add = adsl,
    by_vars = exprs(USUBJID),
    new_vars = exprs(EOSDT, NEWDRGDT)
  )
#> # A tibble: 3 × 7
#>   USUBJID ADT         CNSR STARTDT    PARAMCD  EOSDT      NEWDRGDT
#>   <chr>   <date>     <int> <date>     <chr>    <date>     <date>
#> 1 01      2021-03-06     1 2020-12-06 TTIMPROV 2021-03-06 NA
#> 2 02      2021-03-21     1 2021-01-16 TTIMPROV 2021-04-03 2021-03-21
#> 3 03      2021-03-15     0 2021-02-01 TTIMPROV NA         NA

Please note that subject 01 and 02 are censored at the end of the observation period instead of at the last assessment.

Differentiating censoring reasons

There are three ADaM variables which allow to differentiate censoring reasons:

CNSR: different values >1 can be used to differentiate censoring reasons, e.g., 1 for end of study, 2 for new drug, etc.
EVNTDESC: description of the event or censoring, e.g., "END OF STUDY".
CNSDTDSC: description of the date used for censoring when different from the censoring event, e.g., "LAST ASSESSMENT" if the censoring event is the end of the study but the censoring date is not the end of study date but the last assessment date before the end of the study.

In the example five censoring events are considered:

end of study (eos, no_worsening)
start of a new drug (newdrg, no_worsening)
no post-baseline assessments (no_post_baseline)
no baseline assessment (no_baseline)
no assessments (no_assessments)

In the example data, the first five subjects have the five censoring events, respectively, and the sixth subject has an event.

adsl <- tribble(
~USUBJID, ~TRTSDT,           ~EOSDT,            ~NEWDRGDT,
"01",     ymd("2020-12-06"), ymd("2021-03-06"), NA,
"02",     ymd("2021-01-16"), ymd("2021-04-03"), ymd("2021-03-21"),
"03",     ymd("2021-03-10"), NA,                NA,
"04",     ymd("2021-04-02"), NA,                NA,
"05",     ymd("2021-05-09"), NA,                NA,
"06",     ymd("2021-02-01"), NA,                NA
) %>%
  mutate(STUDYID = "AB42")

adqs <- tribble(
  ~USUBJID, ~ADT,              ~CHG, ~ABLFL,
  "01",     ymd("2021-12-06"),    0, "Y",
  "01",     ymd("2021-02-03"),   -2, NA,
  "01",     ymd("2021-03-01"),   NA, NA,
  "01",     ymd("2021-03-07"),   10, NA,
  "02",     ymd("2021-01-16"),    0, "Y",
  "02",     ymd("2021-02-03"),   -1, NA,
  "02",     ymd("2021-04-01"),  -12, NA,
  "03",     ymd("2021-03-20"),   NA, NA,
  "03",     ymd("2021-04-07"),   NA, NA,
  "04",     ymd("2021-04-02"),    0, "Y",
  "06",     ymd("2021-02-01"),    0, "Y",
  "06",     ymd("2021-03-15"),  -15, NA
) %>%
  mutate(STUDYID = "AB42") %>%
  derive_vars_merged(
    dataset_add = adsl,
    by_vars = exprs(USUBJID),
    new_vars = exprs(TRTSDT)
  )

The eos and newdrg censoring events define the end dates.

eos <- censor_source(
  dataset_name = "adsl",
  date = EOSDT,
  censor = 1,
  set_values_to = exprs(
    EVNTDESC = "END OF STUDY"
  )
)

newdrg <- censor_source(
  dataset_name = "adsl",
  date = NEWDRGDT,
  censor = 2,
  set_values_to = exprs(
    EVNTDESC = "NEW DRUG"
  )
)

The worsening and valid_assessment events define which assessments are events and which are valid assessments, respectively. The EVNTDESC variable is not set by valid_assessment as its value is retrieved from the eos or newdrg censoring events.

worsening <- event_source(
  dataset_name = "adqs",
  date = ADT,
  filter = CHG <= -10,
  set_values_to = exprs(
    EVNTDESC = "WORSENING",
    SRCDOM = "ADQS",
    SRCVAR = "ADT"
  )
)

valid_assessment <- censor_source(
  dataset_name = "adqs",
  date = ADT,
  filter = !is.na(CHG),
  set_values_to = exprs(
    CNSDTDSC = "LAST ASSESSMENT",
    SRCDOM = "ADQS",
    SRCVAR = "ADT"
  )
)

The no_baseline, no_post_baseline, and no_assessment censoring events are used to censor subjects without valid assessments at day one (TRTSDT). Three events are defined to distinguish the reasons for not having valid assessments. The consider_end_dates = FALSE argument is used for these censoring events to avoid that the ENVTDESC and CNSR value from the end date (eos and newdrg) is used.

no_baseline <- censor_source(
  dataset_name = "adqs",
  date = TRTSDT,
  censor = 3,
  filter = is.na(ABLFL),
  order = exprs(ADT),
  consider_end_dates = FALSE,
  set_values_to = exprs(
    EVNTDESC = "NO BASELINE ASSESSMENT",
    CNSDTDSC = "TREATMENT START",
    SRCDOM = "ADQS",
    SRCVAR = "TRTSDT"
  )
)

no_post_baseline <- censor_source(
  dataset_name = "adqs",
  date = TRTSDT,
  censor = 4,
  filter = ABLFL == "Y",
  order = exprs(ADT),
  consider_end_dates = FALSE,
  set_values_to = exprs(
    EVNTDESC = "NO POST-BASELINE ASSESSMENT",
    CNSDTDSC = "TREATMENT START",
    SRCDOM = "ADQS",
    SRCVAR = "TRTSDT"
  )
)

no_assessment <- censor_source(
  dataset_name = "adsl",
  date = TRTSDT,
  censor = 5,
  consider_end_dates = FALSE,
  set_values_to = exprs(
    EVNTDESC = "NO ASSESSMENTS",
    CNSDTDSC = "TREATMENT START",
    SRCDOM = "ADSL",
    SRCVAR = "TRTSDT"
  )
)

In the derive_param_tte() function call, the order of the censoring events in the censor_conditions argument is important. For censoring events the records with the last date is selected. If there are multiple records with the same last date, then the last record in the order specified in the censor_conditions argument is selected. The events no_assessment, no_post_baseline, and no_baseline all use TRTSDT as date, i.e., the date is the same for them. Specifying no_assessment before no_post_baseline ensures that if a subject has no post-baseline assessments the record from no_post_baseline is used, i.e., EVNTDESC is set to "NO POST-BASELINE ASSESSMENT".

derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl, adqs = adqs),
  start_date = TRTSDT,
  end_dates = list(eos, newdrg),
  event_conditions = list(worsening),
  censor_conditions = list(valid_assessment, no_assessment, no_post_baseline, no_baseline),
  set_values_to = exprs(PARAMCD = "TTWORSE")
) %>%
  select(-STUDYID, -PARAMCD)
#> # A tibble: 6 × 8
#>   USUBJID ADT        EVNTDESC            SRCDOM SRCVAR  CNSR CNSDTDSC STARTDT
#>   <chr>   <date>     <chr>               <chr>  <chr>  <int> <chr>    <date>
#> 1 01      2021-02-03 END OF STUDY        ADQS   ADT        1 LAST AS… 2020-12-06
#> 2 02      2021-02-03 NEW DRUG            ADQS   ADT        2 LAST AS… 2021-01-16
#> 3 03      2021-03-10 NO BASELINE ASSESS… ADQS   TRTSDT     3 TREATME… 2021-03-10
#> 4 04      2021-04-02 NO POST-BASELINE A… ADQS   TRTSDT     4 TREATME… 2021-04-02
#> 5 05      2021-05-09 NO ASSESSMENTS      ADSL   TRTSDT     5 TREATME… 2021-05-09
#> 6 06      2021-03-15 WORSENING           ADQS   ADT        0 <NA>     2021-02-01

Differentiating censoring date description

In this example, the event description (EVNTDESC) and the censoring date description (CNSDTDSC) should distinguish between the different censoring reasons: end of study, new drug, or just last assessment. If a variable like CNSDTDSC is set in both the end dates and the censoring condition, the latter overwrites the former. However, the coalesce() function can be used to avoid this.

adsl <- tribble(
~USUBJID, ~TRTSDT,           ~EOSDT,            ~NEWDRGDT,
"01",     ymd("2020-12-06"), ymd("2021-03-06"), NA,
"02",     ymd("2021-01-16"), ymd("2021-04-03"), ymd("2021-03-21"),
"03",     ymd("2021-03-10"), NA,                NA,
"04",     ymd("2021-04-02"), NA,                NA,
"05",     ymd("2021-05-09"), ymd("2021-07-30"), NA
) %>%
  mutate(STUDYID = "AB42")

adqs <- tribble(
  ~USUBJID, ~ADT,              ~CHG,
  "01",     ymd("2021-02-03"),   -2,
  "01",     ymd("2021-03-01"),   NA,
  "01",     ymd("2021-03-07"),   10,
  "02",     ymd("2021-02-03"),   -1,
  "02",     ymd("2021-04-01"),  -12,
  "03",     ymd("2021-03-20"),    2,
  "03",     ymd("2021-04-07"),    5,
  "04",     ymd("2021-04-15"),  -15,
  "05",     ymd("2021-06-01"),  -13
) %>%
  mutate(STUDYID = "AB42") %>%
  derive_vars_merged(
    dataset_add = adsl,
    by_vars = exprs(USUBJID),
    new_vars = exprs(TRTSDT)
  )

Here two end dates are defined which set both EVNTDESC and CNSDTDSC.

eos <- censor_source(
  dataset_name = "adsl",
  date = EOSDT,
  set_values_to = exprs(
    EVNTDESC = "END OF STUDY",
    CNSDTDSC = "LAST QA BEFORE EOS"
  )
)

newdrg <- censor_source(
  dataset_name = "adsl",
  date = NEWDRGDT,
  set_values_to = exprs(
    EVNTDESC = "NEW DRUG",
    CNSDTDSC = "LAST QA BEFORE NEW DRUG"
  )
)

For the censoring condition (valid_assessment) the coalesce() function is used to set the descriptions only if they are not already set by the end dates.

valid_assessment <- censor_source(
  dataset_name = "adqs",
  date = ADT,
  filter = !is.na(CHG),
  set_values_to = exprs(
    EVNTDESC = coalesce(EVNTDESC, "NO WORSENING"),
    CNSDTDSC = coalesce(CNSDTDSC, "LAST QA"),
    SRCDOM = "ADQS",
    SRCVAR = "ADT"
  )
)

worsening <- event_source(
  dataset_name = "adqs",
  date = ADT,
  filter = CHG <= -10,
  set_values_to = exprs(
    EVNTDESC = "WORSENING",
    SRCDOM = "ADQS",
    SRCVAR = "ADT"
  )
)

derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl, adqs = adqs),
  start_date = TRTSDT,
  end_dates = list(eos, newdrg),
  event_conditions = list(worsening),
  censor_conditions = list(valid_assessment),
  set_values_to = exprs(PARAMCD = "TTWORSE")
) %>%
  select(-STUDYID, -PARAMCD, -STARTDT, -SRCDOM, -SRCVAR)
#> # A tibble: 5 × 5
#>   USUBJID ADT        EVNTDESC      CNSR CNSDTDSC
#>   <chr>   <date>     <chr>        <int> <chr>
#> 1 01      2021-02-03 END OF STUDY     1 LAST QA BEFORE EOS
#> 2 02      2021-02-03 NEW DRUG         1 LAST QA BEFORE NEW DRUG
#> 3 03      2021-04-07 NO WORSENING     1 LAST QA
#> 4 04      2021-04-15 WORSENING        0 <NA>
#> 5 05      2021-06-01 WORSENING        0 <NA>

For subjects 01 and 02 the descriptions from the end dates are used. As subject 03 has no end dates, the descriptions from the censoring condition (valid_assessments) are used.

Overall survival time to event parameter

In oncology trials, this is commonly derived as time from randomization date to death. For those without event, they are censored at the last date they are known to be alive.

The start date is set using start_date argument, now that we need to use different to the default.
In this example, datetime was needed, which can be achieved by setting create_datetime argument to TRUE.

adsl <- tribble(
  ~USUBJID, ~RANDDTM,                       ~LSALVDTM,                      ~DTHDTM,                        ~DTHFL,
  "01",     ymd_hms("2020-10-03 00:00:00"), ymd_hms("2022-12-15 23:59:59"), NA,                             NA,
  "02",     ymd_hms("2021-01-23 00:00:00"), ymd_hms("2021-02-03 19:45:59"), ymd_hms("2021-02-03 19:45:59"), "Y"
) %>%
  mutate(STUDYID = "AB42")

# derive overall survival parameter
death <- event_source(
  dataset_name = "adsl",
  filter = DTHFL == "Y",
  date = DTHDTM,
  set_values_to = exprs(
    EVNTDESC = "DEATH",
    SRCDOM = "ADSL",
    SRCVAR = "DTHDTM"
  )
)

last_alive <- censor_source(
  dataset_name = "adsl",
  date = LSALVDTM,
  set_values_to = exprs(
    EVNTDESC = "LAST DATE KNOWN ALIVE",
    SRCDOM = "ADSL",
    SRCVAR = "LSALVDTM"
  )
)

derive_param_tte(
  dataset_adsl = adsl,
  start_date = RANDDTM,
  event_conditions = list(death),
  censor_conditions = list(last_alive),
  create_datetime = TRUE,
  source_datasets = list(adsl = adsl),
  set_values_to = exprs(
    PARAMCD = "OS",
    PARAM = "Overall Survival"
  )
) %>%
  select(USUBJID, STARTDTM, PARAMCD, PARAM, ADTM, CNSR)
#> # A tibble: 2 × 6
#>   USUBJID STARTDTM            PARAMCD PARAM            ADTM                 CNSR
#>   <chr>   <dttm>              <chr>   <chr>            <dttm>              <int>
#> 1 01      2020-10-03 00:00:00 OS      Overall Survival 2022-12-15 23:59:59     1
#> 2 02      2021-01-23 00:00:00 OS      Overall Survival 2021-02-03 19:45:59     0

Duration of response time to event parameter

In oncology trials, this is commonly derived as time from response until progression or death, or if neither have occurred then censor at last tumor assessment visit date. It is only relevant for subjects with a response. Note how only observations for subjects in dataset_adsl have the new parameter created, so see below how this is filtered only on responders.

adsl_resp <- tribble(
  ~USUBJID, ~DTHFL, ~DTHDT,            ~RSPDT,
  "01",     "Y",    ymd("2021-06-12"), ymd("2021-03-04"),
  "02",     "N",    NA,                NA,
  "03",     "Y",    ymd("2021-08-21"), NA,
  "04",     "N",    NA,                ymd("2021-04-14")
) %>%
  mutate(STUDYID = "AB42")

adrs <- tribble(
  ~USUBJID, ~AVALC, ~ADT,              ~ASEQ,
  "01",     "SD",   ymd("2021-01-03"), 1,
  "01",     "PR",   ymd("2021-03-04"), 2,
  "01",     "PD",   ymd("2021-05-05"), 3,
  "02",     "PD",   ymd("2021-02-03"), 1,
  "04",     "SD",   ymd("2021-02-13"), 1,
  "04",     "PR",   ymd("2021-04-14"), 2,
  "04",     "CR",   ymd("2021-05-15"), 3
) %>%
  mutate(STUDYID = "AB42", PARAMCD = "OVR")

pd <- event_source(
  dataset_name = "adrs",
  filter = AVALC == "PD",
  date = ADT,
  set_values_to = exprs(
    EVENTDESC = "PD",
    SRCDOM = "ADRS",
    SRCVAR = "ADTM",
    SRCSEQ = ASEQ
  )
)

death <- event_source(
  dataset_name = "adsl",
  filter = DTHFL == "Y",
  date = DTHDT,
  set_values_to = exprs(
    EVENTDESC = "DEATH",
    SRCDOM = "ADSL",
    SRCVAR = "DTHDT"
  )
)

last_visit <- censor_source(
  dataset_name = "adrs",
  date = ADT,
  set_values_to = exprs(
    EVENTDESC = "LAST TUMOR ASSESSMENT",
    SRCDOM = "ADRS",
    SRCVAR = "ADTM",
    SRCSEQ = ASEQ
  )
)

derive_param_tte(
  dataset_adsl = filter(adsl_resp, !is.na(RSPDT)),
  start_date = RSPDT,
  event_conditions = list(pd, death),
  censor_conditions = list(last_visit),
  source_datasets = list(adsl = adsl_resp, adrs = adrs),
  set_values_to = exprs(
    PARAMCD = "DURRSP",
    PARAM = "Duration of Response"
  )
) %>%
  select(USUBJID, STARTDT, PARAMCD, PARAM, ADT, CNSR, SRCSEQ)
#> # A tibble: 2 × 7
#>   USUBJID STARTDT    PARAMCD PARAM                ADT         CNSR SRCSEQ
#>   <chr>   <date>     <chr>   <chr>                <date>     <int>  <dbl>
#> 1 01      2021-03-04 DURRSP  Duration of Response 2021-05-05     0      3
#> 2 04      2021-04-14 DURRSP  Duration of Response 2021-05-15     1      3

Further examples

Further example usages of this function can be found in the vignette("bds_tte") and vignette("tte_analyses").