Add the Worst or Best Observation for Each By Group as New Records

Add the first available record from events for each by group as new records, all variables of the selected observation are kept. It can be used for selecting the extreme observation from a series of user-defined events. This distinguishes derive_extreme_event() from derive_extreme_records(), where extreme records are derived based on certain order of existing variables.

Usage

derive_extreme_event(
  dataset = NULL,
  by_vars,
  events,
  tmp_event_nr_var = NULL,
  order,
  mode,
  source_datasets = NULL,
  check_type = "warning",
  set_values_to = NULL,
  keep_source_vars = exprs(everything())
)

Arguments

dataset

Input dataset

The variables specified by the by_vars and order arguments are expected to be in the dataset.

Default value: none

by_vars

Grouping variables

Default value: NULL

events

Conditions and new values defining events

A list of event() or event_joined() objects is expected. Only observations listed in the events are considered for deriving extreme event. If multiple records meet the filter condition, take the first record sorted by order. The data is grouped by by_vars, i.e., summary functions like all() or any() can be used in condition.

For event_joined() events the observations are selected by calling filter_joined(). The condition field is passed to the filter_join argument.

Default value: none

tmp_event_nr_var

Temporary event number variable

The specified variable is added to all source datasets and is set to the number of the event before selecting the records of the event.

It can be used in order to determine which record should be used if records from more than one event are selected.

The variable is not included in the output dataset.

Default value: NULL

order

Sort order

If a particular event from events has more than one observation, within the event and by group, the records are ordered by the specified order.

For handling of NAs in sorting variables see Sort Order.

Permitted values: list of expressions created by exprs(), e.g., exprs(ADT, desc(AVAL))
Default value: none

mode

Selection mode (first or last)

If a particular event from events has more than one observation, "first"/"last" is used to select the first/last record of this type of event sorting by order.

Permitted values: "first", "last"
Default value: none

source_datasets

Source datasets

A named list of datasets is expected. The dataset_name field of event() and event_joined() refers to the dataset provided in the list.

Default value: NULL

check_type

Check uniqueness?

If "warning" or "error" is specified, the specified message is issued if the observations of the input dataset are not unique with respect to the by variables and the order.

Permitted values: "none", "message", "warning", "error"
Default value: "warning"

set_values_to

Variables to be set

The specified variables are set to the specified values for the new observations.

Set a list of variables to some specified value for the new records

LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA. If summary functions are used, the values are summarized by the variables specified for by_vars.

For example:

  set_values_to = exprs(
    AVAL = sum(AVAL),
    DTYPE = "AVERAGE",
  )

Default value: none

keep_source_vars

Variables to keep from the source dataset

For each event the specified variables are kept from the selected observations. The variables specified for by_vars and created by set_values_to are always kept. The keep_source_vars field of the event will take precedence over the value of the keep_source_vars argument.

Permitted values: A list of expressions where each element is a symbol or a tidyselect expression, e.g., exprs(VISIT, VISITNUM, starts_with("RS")).
Default value: exprs(everything())

Value

The input dataset with the best or worst observation of each by group added as new observations.

Details

For each event select the observations to consider:
1. If the event is of class event, the observations of the source dataset are restricted by condition and then the first or last (mode) observation per by group (by_vars) is selected.
  
  If the event is of class event_joined, filter_joined() is called to select the observations.
2. The variables specified by the set_values_to field of the event are added to the selected observations.
3. The variable specified for tmp_event_nr_var is added and set to the number of the event.
4. Only the variables specified for the keep_source_vars field of the event, and the by variables (by_vars) and the variables created by set_values_to are kept. If keep_source_vars = NULL is used for an event in derive_extreme_event() the value of the keep_source_vars argument of derive_extreme_event() is used.
All selected observations are bound together.
For each group (with respect to the variables specified for the by_vars parameter) the first or last observation (with respect to the order specified for the order parameter and the mode specified for the mode parameter) is selected.
The variables specified by the set_values_to parameter are added to the selected observations.
The observations are added to input dataset.

Note: This function creates temporary datasets which may be much bigger than the input datasets. If this causes memory issues, please try setting the admiral option save_memory to TRUE (see set_admiral_options()). This reduces the memory consumption but increases the run-time.

Examples

library(tibble)
library(dplyr)
library(lubridate)

adqs <- tribble(
  ~USUBJID, ~PARAMCD,       ~AVALC,        ~ADY,
  "1",      "NO SLEEP",     "N",              1,
  "1",      "WAKE UP",      "N",              2,
  "1",      "FALL ASLEEP",  "N",              3,
  "2",      "NO SLEEP",     "N",              1,
  "2",      "WAKE UP",      "Y",              2,
  "2",      "WAKE UP",      "Y",              3,
  "2",      "FALL ASLEEP",  "N",              4,
  "3",      "NO SLEEP",     NA_character_,    1
)

# Add a new record for each USUBJID storing the the worst sleeping problem.
derive_extreme_event(
  adqs,
  by_vars = exprs(USUBJID),
  events = list(
    event(
      condition = PARAMCD == "NO SLEEP" & AVALC == "Y",
      set_values_to = exprs(AVALC = "No sleep", AVAL = 1)
    ),
    event(
      condition = PARAMCD == "WAKE UP" & AVALC == "Y",
      set_values_to = exprs(AVALC = "Waking up more than three times", AVAL = 2)
    ),
    event(
      condition = PARAMCD == "FALL ASLEEP" & AVALC == "Y",
      set_values_to = exprs(AVALC = "More than 30 mins to fall asleep", AVAL = 3)
    ),
    event(
      condition = all(AVALC == "N"),
      set_values_to = exprs(
        AVALC = "No sleeping problems", AVAL = 4
      )
    ),
    event(
      condition = TRUE,
      set_values_to = exprs(AVALC = "Missing", AVAL = 99)
    )
  ),
  tmp_event_nr_var = event_nr,
  order = exprs(event_nr, desc(ADY)),
  mode = "first",
  set_values_to = exprs(
    PARAMCD = "WSP",
    PARAM = "Worst Sleeping Problems"
  )
)
#> # A tibble: 11 × 6
#>    USUBJID PARAMCD     AVALC                             ADY  AVAL PARAM        
#>    <chr>   <chr>       <chr>                           <dbl> <dbl> <chr>        
#>  1 1       NO SLEEP    N                                   1    NA NA           
#>  2 1       WAKE UP     N                                   2    NA NA           
#>  3 1       FALL ASLEEP N                                   3    NA NA           
#>  4 2       NO SLEEP    N                                   1    NA NA           
#>  5 2       WAKE UP     Y                                   2    NA NA           
#>  6 2       WAKE UP     Y                                   3    NA NA           
#>  7 2       FALL ASLEEP N                                   4    NA NA           
#>  8 3       NO SLEEP    NA                                  1    NA NA           
#>  9 1       WSP         No sleeping problems                3     4 Worst Sleepi…
#> 10 2       WSP         Waking up more than three times     3     2 Worst Sleepi…
#> 11 3       WSP         Missing                             1    99 Worst Sleepi…

# Use different mode by event
adhy <- tribble(
  ~USUBJID, ~AVISITN, ~CRIT1FL,
  "1",             1, "Y",
  "1",             2, "Y",
  "2",             1, "Y",
  "2",             2, NA_character_,
  "2",             3, "Y",
  "2",             4, NA_character_
) %>%
  mutate(
    PARAMCD = "ALKPH",
    PARAM = "Alkaline Phosphatase (U/L)"
  )

derive_extreme_event(
  adhy,
  by_vars = exprs(USUBJID),
  events = list(
    event(
      condition = is.na(CRIT1FL),
      set_values_to = exprs(AVALC = "N")
    ),
    event(
      condition = CRIT1FL == "Y",
      mode = "last",
      set_values_to = exprs(AVALC = "Y")
    )
  ),
  tmp_event_nr_var = event_nr,
  order = exprs(event_nr, AVISITN),
  mode = "first",
  keep_source_vars = exprs(AVISITN),
  set_values_to = exprs(
    PARAMCD = "ALK2",
    PARAM = "ALKPH <= 2 times ULN"
  )
)
#> # A tibble: 8 × 6
#>   USUBJID AVISITN CRIT1FL PARAMCD PARAM                      AVALC
#>   <chr>     <dbl> <chr>   <chr>   <chr>                      <chr>
#> 1 1             1 Y       ALKPH   Alkaline Phosphatase (U/L) NA   
#> 2 1             2 Y       ALKPH   Alkaline Phosphatase (U/L) NA   
#> 3 2             1 Y       ALKPH   Alkaline Phosphatase (U/L) NA   
#> 4 2             2 NA      ALKPH   Alkaline Phosphatase (U/L) NA   
#> 5 2             3 Y       ALKPH   Alkaline Phosphatase (U/L) NA   
#> 6 2             4 NA      ALKPH   Alkaline Phosphatase (U/L) NA   
#> 7 1             2 NA      ALK2    ALKPH <= 2 times ULN       Y    
#> 8 2             2 NA      ALK2    ALKPH <= 2 times ULN       N    

# Derive confirmed best overall response (using event_joined())
# CR - complete response, PR - partial response, SD - stable disease
# NE - not evaluable, PD - progressive disease
adsl <- tribble(
  ~USUBJID, ~TRTSDTC,
  "1",      "2020-01-01",
  "2",      "2019-12-12",
  "3",      "2019-11-11",
  "4",      "2019-12-30",
  "5",      "2020-01-01",
  "6",      "2020-02-02",
  "7",      "2020-02-02",
  "8",      "2020-02-01"
) %>%
  mutate(TRTSDT = ymd(TRTSDTC))

adrs <- tribble(
  ~USUBJID, ~ADTC,        ~AVALC,
  "1",      "2020-01-01", "PR",
  "1",      "2020-02-01", "CR",
  "1",      "2020-02-16", "NE",
  "1",      "2020-03-01", "CR",
  "1",      "2020-04-01", "SD",
  "2",      "2020-01-01", "SD",
  "2",      "2020-02-01", "PR",
  "2",      "2020-03-01", "SD",
  "2",      "2020-03-13", "CR",
  "4",      "2020-01-01", "PR",
  "4",      "2020-03-01", "NE",
  "4",      "2020-04-01", "NE",
  "4",      "2020-05-01", "PR",
  "5",      "2020-01-01", "PR",
  "5",      "2020-01-10", "PR",
  "5",      "2020-01-20", "PR",
  "6",      "2020-02-06", "PR",
  "6",      "2020-02-16", "CR",
  "6",      "2020-03-30", "PR",
  "7",      "2020-02-06", "PR",
  "7",      "2020-02-16", "CR",
  "7",      "2020-04-01", "NE",
  "8",      "2020-02-16", "PD"
) %>%
  mutate(
    ADT = ymd(ADTC),
    PARAMCD = "OVR",
    PARAM = "Overall Response by Investigator"
  ) %>%
  derive_vars_merged(
    dataset_add = adsl,
    by_vars = exprs(USUBJID),
    new_vars = exprs(TRTSDT)
  )

derive_extreme_event(
  adrs,
  by_vars = exprs(USUBJID),
  tmp_event_nr_var = event_nr,
  order = exprs(event_nr, ADT),
  mode = "first",
  source_datasets = list(adsl = adsl),
  events = list(
    event_joined(
      description = paste(
        "CR needs to be confirmed by a second CR at least 28 days later",
        "at most one NE is acceptable between the two assessments"
      ),
      join_vars = exprs(AVALC, ADT),
      join_type = "after",
      first_cond_upper = AVALC.join == "CR" &
        ADT.join >= ADT + 28,
      condition = AVALC == "CR" &
        all(AVALC.join %in% c("CR", "NE")) &
        count_vals(var = AVALC.join, val = "NE") <= 1,
      set_values_to = exprs(
        AVALC = "CR"
      )
    ),
    event_joined(
      description = paste(
        "PR needs to be confirmed by a second CR or PR at least 28 days later,",
        "at most one NE is acceptable between the two assessments"
      ),
      join_vars = exprs(AVALC, ADT),
      join_type = "after",
      first_cond_upper = AVALC.join %in% c("CR", "PR") &
        ADT.join >= ADT + 28,
      condition = AVALC == "PR" &
        all(AVALC.join %in% c("CR", "PR", "NE")) &
        count_vals(var = AVALC.join, val = "NE") <= 1,
      set_values_to = exprs(
        AVALC = "PR"
      )
    ),
    event(
      description = paste(
        "CR, PR, or SD are considered as SD if occurring at least 28",
        "after treatment start"
      ),
      condition = AVALC %in% c("CR", "PR", "SD") & ADT >= TRTSDT + 28,
      set_values_to = exprs(
        AVALC = "SD"
      )
    ),
    event(
      condition = AVALC == "PD",
      set_values_to = exprs(
        AVALC = "PD"
      )
    ),
    event(
      condition = AVALC %in% c("CR", "PR", "SD", "NE"),
      set_values_to = exprs(
        AVALC = "NE"
      )
    ),
    event(
      description = "set response to MISSING for patients without records in ADRS",
      dataset_name = "adsl",
      condition = TRUE,
      set_values_to = exprs(
        AVALC = "MISSING"
      ),
      keep_source_vars = exprs(TRTSDT)
    )
  ),
  set_values_to = exprs(
    PARAMCD = "CBOR",
    PARAM = "Best Confirmed Overall Response by Investigator"
  )
) %>%
  filter(PARAMCD == "CBOR")
#> # A tibble: 8 × 7
#>   USUBJID ADTC       AVALC   ADT        PARAMCD PARAM                 TRTSDT    
#>   <chr>   <chr>      <chr>   <date>     <chr>   <chr>                 <date>    
#> 1 1       2020-02-01 CR      2020-02-01 CBOR    Best Confirmed Overa… 2020-01-01
#> 2 2       2020-02-01 SD      2020-02-01 CBOR    Best Confirmed Overa… 2019-12-12
#> 3 3       NA         MISSING NA         CBOR    Best Confirmed Overa… 2019-11-11
#> 4 4       2020-05-01 SD      2020-05-01 CBOR    Best Confirmed Overa… 2019-12-30
#> 5 5       2020-01-01 NE      2020-01-01 CBOR    Best Confirmed Overa… 2020-01-01
#> 6 6       2020-02-06 PR      2020-02-06 CBOR    Best Confirmed Overa… 2020-02-02
#> 7 7       2020-02-06 NE      2020-02-06 CBOR    Best Confirmed Overa… 2020-02-02
#> 8 8       2020-02-16 PD      2020-02-16 CBOR    Best Confirmed Overa… 2020-02-01