Skip to contents

Add the first or last observation for each by group as new observations. The new observations can be selected from the additional dataset. This function can be used for adding the maximum or minimum value as a separate visit. All variables of the selected observation are kept. This distinguishes derive_extreme_records() from derive_summary_records(), where only the by variables are populated for the new records.

Usage

derive_extreme_records(
  dataset = NULL,
  dataset_add,
  dataset_ref = NULL,
  by_vars = NULL,
  order = NULL,
  mode = NULL,
  filter_add = NULL,
  check_type = "warning",
  exist_flag = NULL,
  true_value = "Y",
  false_value = NA_character_,
  keep_source_vars = exprs(everything()),
  set_values_to
)

Arguments

dataset

Input dataset

dataset_add

Additional dataset

The additional dataset, which determines the by groups returned in the input dataset, based on the groups that exist in this dataset after being subset by filter_add.

The variables specified in the by_vars and filter_add parameters are expected in this dataset. If mode and order are specified, the first or last observation within each by group, defined by by_vars, is selected.

dataset_ref

Reference dataset

The variables specified for by_vars are expected. For each observation of the specified dataset a new observation is added to the input dataset.

by_vars

Grouping variables

If dataset_ref is specified, this argument must be specified.

Permitted Values: list of variables created by exprs() e.g. exprs(USUBJID, VISIT)

order

Sort order

Within each by group the observations are ordered by the specified order.

Permitted Values: list of expressions created by exprs(), e.g., exprs(ADT, desc(AVAL))

mode

Selection mode (first or last)

If "first" is specified, the first observation of each by group is added to the input dataset. If "last" is specified, the last observation of each by group is added to the input dataset.

Permitted Values: "first", "last"

filter_add

Filter for additional dataset (dataset_add)

Only observations in dataset_add fulfilling the specified condition are considered.

check_type

Check uniqueness?

If "warning" or "error" is specified, the specified message is issued if the observations of the (restricted) additional dataset are not unique with respect to the by variables and the order.

Permitted Values: "none", "warning", "error"

exist_flag

Existence flag

The specified variable is added to the output dataset.

For by groups with at least one observation in the additional dataset (dataset_add) exist_flag is set to the value specified by the true_value argument.

For all other by groups exist_flag is set to the value specified by the false_value argument.

Permitted Values: Variable name

true_value

True value

For new observations selected from the additional dataset (dataset_add), exist_flag is set to the specified value.

false_value

False value

For new observations not selected from the additional dataset (dataset_add), exist_flag is set to the specified value.

keep_source_vars

Variables to be kept in the new records

A named list or tidyselect expressions created by exprs() defining the variables to be kept for the new records. The variables specified for by_vars and set_values_to need not be specified here as they are kept automatically.

set_values_to

Variables to be set

The specified variables are set to the specified values for the new observations.

Set a list of variables to some specified value for the new records

  • LHS refer to a variable.

  • RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA. If summary functions are used, the values are summarized by the variables specified for by_vars.

For example:

  set_values_to = exprs(
    AVAL = sum(AVAL),
    DTYPE = "AVERAGE",
  )

Value

The input dataset with the first or last observation of each by group added as new observations.

Details

  1. The additional dataset (dataset_add) is restricted as specified by the filter_add argument.

  2. For each group (with respect to the variables specified for the by_vars argument) the first or last observation (with respect to the order specified for the order argument and the mode specified for the mode argument) is selected.

  3. If dataset_ref is specified, observations which are in dataset_ref but not in the selected records are added.

  4. The variables specified by the set_values_to argument are added to the selected observations.

  5. The variables specified by the keep_source_vars argument are selected along with the variables specified in by_vars and set_values_to arguments.

  6. The observations are added to input dataset.

Examples

library(tibble)
library(dplyr, warn.conflicts = FALSE)
library(lubridate)

adlb <- tribble(
  ~USUBJID, ~AVISITN, ~AVAL, ~LBSEQ,
  "1",      1,          113,      1,
  "1",      2,          113,      2,
  "1",      3,          117,      3,
  "2",      1,          101,      1,
  "2",      2,          101,      2,
  "2",      3,           95,      3
)

# Add a new record for each USUBJID storing the minimum value (first AVAL).
# If multiple records meet the minimum criterion, take the first value by
# AVISITN. Set AVISITN = 97 and DTYPE = MINIMUM for these new records.
# Specify the variables that need to be kept in the new records.
derive_extreme_records(
  adlb,
  dataset_add = adlb,
  by_vars = exprs(USUBJID),
  order = exprs(AVAL, AVISITN),
  mode = "first",
  filter_add = !is.na(AVAL),
  keep_source_vars = exprs(AVAL),
  set_values_to = exprs(
    AVISITN = 97,
    DTYPE = "MINIMUM"
  )
)
#> # A tibble: 8 × 5
#>   USUBJID AVISITN  AVAL LBSEQ DTYPE  
#>   <chr>     <dbl> <dbl> <dbl> <chr>  
#> 1 1             1   113     1 NA     
#> 2 1             2   113     2 NA     
#> 3 1             3   117     3 NA     
#> 4 2             1   101     1 NA     
#> 5 2             2   101     2 NA     
#> 6 2             3    95     3 NA     
#> 7 1            97   113    NA MINIMUM
#> 8 2            97    95    NA MINIMUM

# Add a new record for each USUBJID storing the maximum value (last AVAL).
# If multiple records meet the maximum criterion, take the first value by
# AVISITN. Set AVISITN = 98 and DTYPE = MAXIMUM for these new records.
derive_extreme_records(
  adlb,
  dataset_add = adlb,
  by_vars = exprs(USUBJID),
  order = exprs(desc(AVAL), AVISITN),
  mode = "first",
  filter_add = !is.na(AVAL),
  set_values_to = exprs(
    AVISITN = 98,
    DTYPE = "MAXIMUM"
  )
)
#> # A tibble: 8 × 5
#>   USUBJID AVISITN  AVAL LBSEQ DTYPE  
#>   <chr>     <dbl> <dbl> <dbl> <chr>  
#> 1 1             1   113     1 NA     
#> 2 1             2   113     2 NA     
#> 3 1             3   117     3 NA     
#> 4 2             1   101     1 NA     
#> 5 2             2   101     2 NA     
#> 6 2             3    95     3 NA     
#> 7 1            98   117     3 MAXIMUM
#> 8 2            98   101     1 MAXIMUM

# Add a new record for each USUBJID storing for the last value.
# Set AVISITN = 99 and DTYPE = LOV for these new records.
derive_extreme_records(
  adlb,
  dataset_add = adlb,
  by_vars = exprs(USUBJID),
  order = exprs(AVISITN),
  mode = "last",
  set_values_to = exprs(
    AVISITN = 99,
    DTYPE = "LOV"
  )
)
#> # A tibble: 8 × 5
#>   USUBJID AVISITN  AVAL LBSEQ DTYPE
#>   <chr>     <dbl> <dbl> <dbl> <chr>
#> 1 1             1   113     1 NA   
#> 2 1             2   113     2 NA   
#> 3 1             3   117     3 NA   
#> 4 2             1   101     1 NA   
#> 5 2             2   101     2 NA   
#> 6 2             3    95     3 NA   
#> 7 1            99   117     3 LOV  
#> 8 2            99    95     3 LOV  

# Derive a new parameter for the first disease progression (PD)
adsl <- tribble(
  ~USUBJID, ~DTHDT,
  "1",      ymd("2022-05-13"),
  "2",      ymd(""),
  "3",      ymd("")
) %>%
  mutate(STUDYID = "XX1234")

adrs <- tribble(
  ~USUBJID, ~ADTC,        ~AVALC,
  "1",      "2020-01-02", "PR",
  "1",      "2020-02-01", "CR",
  "1",      "2020-03-01", "CR",
  "1",      "2020-04-01", "SD",
  "2",      "2021-06-15", "SD",
  "2",      "2021-07-16", "PD",
  "2",      "2021-09-14", "PD"
) %>%
  mutate(
    STUDYID = "XX1234",
    ADT = ymd(ADTC),
    PARAMCD = "OVR",
    PARAM = "Overall Response",
    ANL01FL = "Y"
  ) %>%
  select(-ADTC)

derive_extreme_records(
  adrs,
  dataset_ref = adsl,
  dataset_add = adrs,
  by_vars = exprs(STUDYID, USUBJID),
  filter_add = PARAMCD == "OVR" & AVALC == "PD",
  order = exprs(ADT),
  exist_flag = AVALC,
  true_value = "Y",
  false_value = "N",
  mode = "first",
  set_values_to = exprs(
    PARAMCD = "PD",
    PARAM = "Disease Progression",
    AVAL = yn_to_numeric(AVALC),
    ANL01FL = "Y",
    ADT = ADT
  )
)
#> # A tibble: 10 × 8
#>    USUBJID AVALC STUDYID ADT        PARAMCD PARAM               ANL01FL  AVAL
#>    <chr>   <chr> <chr>   <date>     <chr>   <chr>               <chr>   <dbl>
#>  1 1       PR    XX1234  2020-01-02 OVR     Overall Response    Y          NA
#>  2 1       CR    XX1234  2020-02-01 OVR     Overall Response    Y          NA
#>  3 1       CR    XX1234  2020-03-01 OVR     Overall Response    Y          NA
#>  4 1       SD    XX1234  2020-04-01 OVR     Overall Response    Y          NA
#>  5 2       SD    XX1234  2021-06-15 OVR     Overall Response    Y          NA
#>  6 2       PD    XX1234  2021-07-16 OVR     Overall Response    Y          NA
#>  7 2       PD    XX1234  2021-09-14 OVR     Overall Response    Y          NA
#>  8 2       Y     XX1234  2021-07-16 PD      Disease Progression Y           1
#>  9 1       N     XX1234  NA         PD      Disease Progression Y           0
#> 10 3       N     XX1234  NA         PD      Disease Progression Y           0

# derive parameter indicating death
derive_extreme_records(
  dataset_ref = adsl,
  dataset_add = adsl,
  by_vars = exprs(STUDYID, USUBJID),
  filter_add = !is.na(DTHDT),
  exist_flag = AVALC,
  true_value = "Y",
  false_value = "N",
  mode = "first",
  set_values_to = exprs(
    PARAMCD = "DEATH",
    PARAM = "Death",
    ANL01FL = "Y",
    ADT = DTHDT
  )
)
#> # A tibble: 3 × 8
#>   STUDYID USUBJID PARAMCD PARAM ANL01FL ADT        DTHDT      AVALC
#>   <chr>   <chr>   <chr>   <chr> <chr>   <date>     <date>     <chr>
#> 1 XX1234  1       DEATH   Death Y       2022-05-13 2022-05-13 Y    
#> 2 XX1234  2       DEATH   Death Y       NA         NA         N    
#> 3 XX1234  3       DEATH   Death Y       NA         NA         N