Add the Worst or Best Observation for Each By Group as New Records
Source:R/derive_extreme_event.R
derive_extreme_event.Rd
Add the first available record from events
for each by group as new
records, all variables of the selected observation are kept. It can be used
for selecting the extreme observation from a series of user-defined events.
This distinguishes derive_extreme_event()
from derive_extreme_records()
,
where extreme records are derived based on certain order of existing
variables.
Usage
derive_extreme_event(
dataset = NULL,
by_vars,
events,
tmp_event_nr_var = NULL,
order,
mode,
source_datasets = NULL,
check_type = "warning",
set_values_to = NULL,
keep_source_vars = exprs(everything())
)
Arguments
- dataset
Input dataset
The variables specified by the
by_vars
andorder
arguments are expected to be in the dataset.- by_vars
Grouping variables
Default:
NULL
Permitted Values: list of variables created by
exprs()
e.g.exprs(USUBJID, VISIT)
- events
Conditions and new values defining events
A list of
event()
orevent_joined()
objects is expected. Only observations listed in theevents
are considered for deriving extreme event. If multiple records meet the filtercondition
, take the first record sorted byorder
. The data is grouped byby_vars
, i.e., summary functions likeall()
orany()
can be used incondition
.For
event_joined()
events the observations are selected by callingfilter_joined()
. Thecondition
field is passed to thefilter_join
argument.- tmp_event_nr_var
Temporary event number variable
The specified variable is added to all source datasets and is set to the number of the event before selecting the records of the event.
It can be used in
order
to determine which record should be used if records from more than one event are selected.The variable is not included in the output dataset.
- order
Sort order
If a particular event from
events
has more than one observation, within the event and by group, the records are ordered by the specified order.For handling of
NA
s in sorting variables see Sort Order.Permitted Values: list of expressions created by
exprs()
, e.g.,exprs(ADT, desc(AVAL))
- mode
Selection mode (first or last)
If a particular event from
events
has more than one observation,"first"
/"last"
is used to select the first/last record of this type of event sorting byorder
.Permitted Values:
"first"
,"last"
- source_datasets
Source datasets
A named list of datasets is expected. The
dataset_name
field ofevent()
andevent_joined()
refers to the dataset provided in the list.- check_type
Check uniqueness?
If
"warning"
or"error"
is specified, the specified message is issued if the observations of the input dataset are not unique with respect to the by variables and the order.Default:
"warning"
Permitted Values:
"none"
,"warning"
,"error"
- set_values_to
Variables to be set
The specified variables are set to the specified values for the new observations.
Set a list of variables to some specified value for the new records
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA. If summary functions are used, the values are summarized by the variables specified for
by_vars
.
For example:
- keep_source_vars
Variables to keep from the source dataset
For each event the specified variables are kept from the selected observations. The variables specified for
by_vars
and created byset_values_to
are always kept. Thekeep_source_vars
field of the event will take precedence over the value of thekeep_source_vars
argument.Permitted Values: A list of expressions where each element is a symbol or a tidyselect expression, e.g.,
exprs(VISIT, VISITNUM, starts_with("RS"))
.
Value
The input dataset with the best or worst observation of each by group added as new observations.
Details
For each event select the observations to consider:
If the event is of class
event
, the observations of the source dataset are restricted bycondition
and then the first or last (mode
) observation per by group (by_vars
) is selected.If the event is of class
event_joined
,filter_joined()
is called to select the observations.The variables specified by the
set_values_to
field of the event are added to the selected observations.The variable specified for
tmp_event_nr_var
is added and set to the number of the event.Only the variables specified for the
keep_source_vars
field of the event, and the by variables (by_vars
) and the variables created byset_values_to
are kept. Ifkeep_source_vars = NULL
is used for an event inderive_extreme_event()
the value of thekeep_source_vars
argument ofderive_extreme_event()
is used.
All selected observations are bound together.
For each group (with respect to the variables specified for the
by_vars
parameter) the first or last observation (with respect to the order specified for theorder
parameter and the mode specified for themode
parameter) is selected.The variables specified by the
set_values_to
parameter are added to the selected observations.The observations are added to input dataset.
See also
event()
, event_joined()
, derive_vars_extreme_event()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_records()
,
derive_locf_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()
,
derive_summary_records()
Examples
library(tibble)
library(dplyr)
library(lubridate)
adqs <- tribble(
~USUBJID, ~PARAMCD, ~AVALC, ~ADY,
"1", "NO SLEEP", "N", 1,
"1", "WAKE UP", "N", 2,
"1", "FALL ASLEEP", "N", 3,
"2", "NO SLEEP", "N", 1,
"2", "WAKE UP", "Y", 2,
"2", "WAKE UP", "Y", 3,
"2", "FALL ASLEEP", "N", 4,
"3", "NO SLEEP", NA_character_, 1
)
# Add a new record for each USUBJID storing the the worst sleeping problem.
derive_extreme_event(
adqs,
by_vars = exprs(USUBJID),
events = list(
event(
condition = PARAMCD == "NO SLEEP" & AVALC == "Y",
set_values_to = exprs(AVALC = "No sleep", AVAL = 1)
),
event(
condition = PARAMCD == "WAKE UP" & AVALC == "Y",
set_values_to = exprs(AVALC = "Waking up more than three times", AVAL = 2)
),
event(
condition = PARAMCD == "FALL ASLEEP" & AVALC == "Y",
set_values_to = exprs(AVALC = "More than 30 mins to fall asleep", AVAL = 3)
),
event(
condition = all(AVALC == "N"),
set_values_to = exprs(
AVALC = "No sleeping problems", AVAL = 4
)
),
event(
condition = TRUE,
set_values_to = exprs(AVALC = "Missing", AVAL = 99)
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, desc(ADY)),
mode = "first",
set_values_to = exprs(
PARAMCD = "WSP",
PARAM = "Worst Sleeping Problems"
)
)
#> # A tibble: 11 × 6
#> USUBJID PARAMCD AVALC ADY AVAL PARAM
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 1 NO SLEEP N 1 NA NA
#> 2 1 WAKE UP N 2 NA NA
#> 3 1 FALL ASLEEP N 3 NA NA
#> 4 2 NO SLEEP N 1 NA NA
#> 5 2 WAKE UP Y 2 NA NA
#> 6 2 WAKE UP Y 3 NA NA
#> 7 2 FALL ASLEEP N 4 NA NA
#> 8 3 NO SLEEP NA 1 NA NA
#> 9 1 WSP No sleeping problems 3 4 Worst Sleepi…
#> 10 2 WSP Waking up more than three times 3 2 Worst Sleepi…
#> 11 3 WSP Missing 1 99 Worst Sleepi…
# Use different mode by event
adhy <- tribble(
~USUBJID, ~AVISITN, ~CRIT1FL,
"1", 1, "Y",
"1", 2, "Y",
"2", 1, "Y",
"2", 2, NA_character_,
"2", 3, "Y",
"2", 4, NA_character_
) %>%
mutate(
PARAMCD = "ALKPH",
PARAM = "Alkaline Phosphatase (U/L)"
)
derive_extreme_event(
adhy,
by_vars = exprs(USUBJID),
events = list(
event(
condition = is.na(CRIT1FL),
set_values_to = exprs(AVALC = "N")
),
event(
condition = CRIT1FL == "Y",
mode = "last",
set_values_to = exprs(AVALC = "Y")
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, AVISITN),
mode = "first",
keep_source_vars = exprs(AVISITN),
set_values_to = exprs(
PARAMCD = "ALK2",
PARAM = "ALKPH <= 2 times ULN"
)
)
#> # A tibble: 8 × 6
#> USUBJID AVISITN CRIT1FL PARAMCD PARAM AVALC
#> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 1 Y ALKPH Alkaline Phosphatase (U/L) NA
#> 2 1 2 Y ALKPH Alkaline Phosphatase (U/L) NA
#> 3 2 1 Y ALKPH Alkaline Phosphatase (U/L) NA
#> 4 2 2 NA ALKPH Alkaline Phosphatase (U/L) NA
#> 5 2 3 Y ALKPH Alkaline Phosphatase (U/L) NA
#> 6 2 4 NA ALKPH Alkaline Phosphatase (U/L) NA
#> 7 1 2 NA ALK2 ALKPH <= 2 times ULN Y
#> 8 2 2 NA ALK2 ALKPH <= 2 times ULN N
# Derive confirmed best overall response (using event_joined())
# CR - complete response, PR - partial response, SD - stable disease
# NE - not evaluable, PD - progressive disease
adsl <- tribble(
~USUBJID, ~TRTSDTC,
"1", "2020-01-01",
"2", "2019-12-12",
"3", "2019-11-11",
"4", "2019-12-30",
"5", "2020-01-01",
"6", "2020-02-02",
"7", "2020-02-02",
"8", "2020-02-01"
) %>%
mutate(TRTSDT = ymd(TRTSDTC))
adrs <- tribble(
~USUBJID, ~ADTC, ~AVALC,
"1", "2020-01-01", "PR",
"1", "2020-02-01", "CR",
"1", "2020-02-16", "NE",
"1", "2020-03-01", "CR",
"1", "2020-04-01", "SD",
"2", "2020-01-01", "SD",
"2", "2020-02-01", "PR",
"2", "2020-03-01", "SD",
"2", "2020-03-13", "CR",
"4", "2020-01-01", "PR",
"4", "2020-03-01", "NE",
"4", "2020-04-01", "NE",
"4", "2020-05-01", "PR",
"5", "2020-01-01", "PR",
"5", "2020-01-10", "PR",
"5", "2020-01-20", "PR",
"6", "2020-02-06", "PR",
"6", "2020-02-16", "CR",
"6", "2020-03-30", "PR",
"7", "2020-02-06", "PR",
"7", "2020-02-16", "CR",
"7", "2020-04-01", "NE",
"8", "2020-02-16", "PD"
) %>%
mutate(
ADT = ymd(ADTC),
PARAMCD = "OVR",
PARAM = "Overall Response by Investigator"
) %>%
derive_vars_merged(
dataset_add = adsl,
by_vars = exprs(USUBJID),
new_vars = exprs(TRTSDT)
)
derive_extreme_event(
adrs,
by_vars = exprs(USUBJID),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, ADT),
mode = "first",
source_datasets = list(adsl = adsl),
events = list(
event_joined(
description = paste(
"CR needs to be confirmed by a second CR at least 28 days later",
"at most one NE is acceptable between the two assessments"
),
join_vars = exprs(AVALC, ADT),
join_type = "after",
first_cond_upper = AVALC.join == "CR" &
ADT.join >= ADT + 28,
condition = AVALC == "CR" &
all(AVALC.join %in% c("CR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1,
set_values_to = exprs(
AVALC = "CR"
)
),
event_joined(
description = paste(
"PR needs to be confirmed by a second CR or PR at least 28 days later,",
"at most one NE is acceptable between the two assessments"
),
join_vars = exprs(AVALC, ADT),
join_type = "after",
first_cond_upper = AVALC.join %in% c("CR", "PR") &
ADT.join >= ADT + 28,
condition = AVALC == "PR" &
all(AVALC.join %in% c("CR", "PR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1,
set_values_to = exprs(
AVALC = "PR"
)
),
event(
description = paste(
"CR, PR, or SD are considered as SD if occurring at least 28",
"after treatment start"
),
condition = AVALC %in% c("CR", "PR", "SD") & ADT >= TRTSDT + 28,
set_values_to = exprs(
AVALC = "SD"
)
),
event(
condition = AVALC == "PD",
set_values_to = exprs(
AVALC = "PD"
)
),
event(
condition = AVALC %in% c("CR", "PR", "SD", "NE"),
set_values_to = exprs(
AVALC = "NE"
)
),
event(
description = "set response to MISSING for patients without records in ADRS",
dataset_name = "adsl",
condition = TRUE,
set_values_to = exprs(
AVALC = "MISSING"
),
keep_source_vars = exprs(TRTSDT)
)
),
set_values_to = exprs(
PARAMCD = "CBOR",
PARAM = "Best Confirmed Overall Response by Investigator"
)
) %>%
filter(PARAMCD == "CBOR")
#> # A tibble: 8 × 7
#> USUBJID ADTC AVALC ADT PARAMCD PARAM TRTSDT
#> <chr> <chr> <chr> <date> <chr> <chr> <date>
#> 1 1 2020-02-01 CR 2020-02-01 CBOR Best Confirmed Overa… 2020-01-01
#> 2 2 2020-02-01 SD 2020-02-01 CBOR Best Confirmed Overa… 2019-12-12
#> 3 3 NA MISSING NA CBOR Best Confirmed Overa… 2019-11-11
#> 4 4 2020-05-01 SD 2020-05-01 CBOR Best Confirmed Overa… 2019-12-30
#> 5 5 2020-01-01 NE 2020-01-01 CBOR Best Confirmed Overa… 2020-01-01
#> 6 6 2020-02-06 PR 2020-02-06 CBOR Best Confirmed Overa… 2020-02-02
#> 7 7 2020-02-06 NE 2020-02-06 CBOR Best Confirmed Overa… 2020-02-02
#> 8 8 2020-02-16 PD 2020-02-16 CBOR Best Confirmed Overa… 2020-02-01