
Add the Worst or Best Observation for Each By Group as New Records
Source:R/derive_extreme_event.R
derive_extreme_event.Rd
Add the first available record from events
for each by group as new
records, all variables of the selected observation are kept. It can be used
for selecting the extreme observation from a series of user-defined events.
This distinguishes derive_extreme_event()
from derive_extreme_records()
,
where extreme records are derived based on certain order of existing
variables.
Usage
derive_extreme_event(
dataset = NULL,
by_vars,
events,
tmp_event_nr_var = NULL,
order,
mode,
source_datasets = NULL,
check_type = "warning",
set_values_to = NULL,
keep_source_vars = exprs(everything())
)
Arguments
- dataset
Input dataset
The variables specified by the
by_vars
andorder
arguments are expected to be in the dataset.- Permitted values
a dataset, i.e., a
data.frame
or tibble- Default value
none
- by_vars
Grouping variables
- Permitted values
list of variables created by
exprs()
, e.g.,exprs(USUBJID, VISIT)
- Default value
NULL
- events
Conditions and new values defining events
A list of
event()
orevent_joined()
objects is expected. Only observations listed in theevents
are considered for deriving extreme event. If multiple records meet the filtercondition
, take the first record sorted byorder
. The data is grouped byby_vars
, i.e., summary functions likeall()
orany()
can be used incondition
.For
event_joined()
events the observations are selected by callingfilter_joined()
. Thecondition
field is passed to thefilter_join
argument.- Permitted values
an
event()
orevent_joined()
object- Default value
none
- tmp_event_nr_var
Temporary event number variable
The specified variable is added to all source datasets and is set to the number of the event before selecting the records of the event.
It can be used in
order
to determine which record should be used if records from more than one event are selected.The variable is not included in the output dataset.
- Permitted values
an unquoted symbol, e.g.,
AVAL
- Default value
NULL
- order
Sort order
If a particular event from
events
has more than one observation, within the event and by group, the records are ordered by the specified order.For handling of
NA
s in sorting variables see Sort Order.- Permitted values
list of variables created by
exprs()
, e.g.,exprs(USUBJID, VISIT)
- Default value
none
- mode
Selection mode (first or last)
If a particular event from
events
has more than one observation,"first"
/"last"
is used to select the first/last record of this type of event sorting byorder
.- Permitted values
"first"
,"last"
- Default value
none
- source_datasets
Source datasets
A named list of datasets is expected. The
dataset_name
field ofevent()
andevent_joined()
refers to the dataset provided in the list.- Permitted values
named list of datasets, e.g.,
list(adsl = adsl, ae = ae)
- Default value
NULL
- check_type
Check uniqueness?
If
"warning"
or"error"
is specified, the specified message is issued if the observations of the input dataset are not unique with respect to the by variables and the order.- Permitted values
"none"
,"message"
,"warning"
,"error"
- Default value
"warning"
- set_values_to
Variables to be set
The specified variables are set to the specified values for the new observations.
Set a list of variables to some specified value for the new records
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA.
For example:
- Permitted values
list of named expressions created by a formula using
exprs()
, e.g.,exprs(AVALC = VSSTRESC, AVAL = yn_to_numeric(AVALC))
- Default value
NULL
- keep_source_vars
Variables to keep from the source dataset
For each event the specified variables are kept from the selected observations. The variables specified for
by_vars
and created byset_values_to
are always kept. Thekeep_source_vars
field of the event will take precedence over the value of thekeep_source_vars
argument.- Permitted values
list of variables created by
exprs()
, e.g.,exprs(USUBJID, VISIT)
- Default value
exprs(everything())
Value
The input dataset with the best or worst observation of each by group added as new observations.
Details
For each event select the observations to consider:
If the event is of class
event
, the observations of the source dataset are restricted bycondition
and then the first or last (mode
) observation per by group (by_vars
) is selected.If the event is of class
event_joined
,filter_joined()
is called to select the observations.The variables specified by the
set_values_to
field of the event are added to the selected observations.The variable specified for
tmp_event_nr_var
is added and set to the number of the event.Only the variables specified for the
keep_source_vars
field of the event, and the by variables (by_vars
) and the variables created byset_values_to
are kept. Ifkeep_source_vars = NULL
is used for an event inderive_extreme_event()
the value of thekeep_source_vars
argument ofderive_extreme_event()
is used.
All selected observations are bound together.
For each group (with respect to the variables specified for the
by_vars
parameter) the first or last observation (with respect to the order specified for theorder
parameter and the mode specified for themode
parameter) is selected.The variables specified by the
set_values_to
parameter are added to the selected observations.The observations are added to input dataset.
Note: This function creates temporary datasets which may be much bigger
than the input datasets. If this causes memory issues, please try setting
the admiral option save_memory
to TRUE
(see set_admiral_options()
).
This reduces the memory consumption but increases the run-time.
See also
event()
, event_joined()
, derive_vars_extreme_event()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_records()
,
derive_locf_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()
,
derive_summary_records()
Examples
Add a new record for the worst observation using event()
objects
For each subject, the observation containing the worst sleeping problem (if any exist) should be identified and added as a new record, retaining all variables from the original observation. If multiple occurrences of the worst sleeping problem occur, or no sleeping problems, then take the observation occurring at the latest day.
The groups for which new records are added are specified by the
by_vars
argument. Here for each subject a record should be added. Thusby_vars = exprs(STUDYID, USUBJID)
is specified.The sets of possible sleeping problems are passed through the
events
argument asevent()
objects. Each event contains acondition
which may or may not be satisfied by each record (or possibly a group of records) within the input datasetdataset
. Summary functions such asany()
andall()
are often handy to use within conditions, as is done here for the third event, which checks that the subject had no sleeping issues. The final event uses a catch-allcondition = TRUE
to ensure all subjects have a new record derived. Note that in this example, as no condition involves analysis of cross-comparison values of within records, it is sufficient to useevent()
objects rather thanevent_joined()
- see the next example for a more complex condition.If any subject has one or more records satisfying the conditions from events, we can select just one record using the
order
argument. In this example, the first argument passed toorder
isevent_nr
, which is a temporary variable created through thetmp_event_nr_var
argument, which numbers the events consecutively. Sincemode = "first"
, we only consider the first event for which a condition is satisfied. Within that event, we consider only the observation with the latest day, because the second argument for the order isdesc(ADY)
.Once a record is identified as satisfying an event's condition, a new observation is created by the following process:
the selected record is copied,
the variables specified in the event's
set_values_to
(here,AVAL
andAVALC
) are created/updated,the variables specified in
keep_source_vars
(here,ADY
does due to the use of the tidyselect expressioneverything()
) (plusby_vars
and the variables fromset_values_to
) are kept,the variables specified in the global
set_values_to
(here,PARAM
andPARAMCD
) are created/updated.
library(tibble, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
adqs1 <- tribble(
~USUBJID, ~PARAMCD, ~AVALC, ~ADY,
"1", "NO SLEEP", "N", 1,
"1", "WAKE UP 3X", "N", 2,
"2", "NO SLEEP", "N", 1,
"2", "WAKE UP 3X", "Y", 2,
"2", "WAKE UP 3X", "Y", 3,
"3", "NO SLEEP", NA_character_, 1
) %>%
mutate(STUDYID = "AB42")
derive_extreme_event(
adqs1,
by_vars = exprs(STUDYID, USUBJID),
events = list(
event(
condition = PARAMCD == "NO SLEEP" & AVALC == "Y",
set_values_to = exprs(AVALC = "No sleep", AVAL = 1)
),
event(
condition = PARAMCD == "WAKE UP 3X" & AVALC == "Y",
set_values_to = exprs(AVALC = "Waking up three times", AVAL = 2)
),
event(
condition = all(AVALC == "N"),
set_values_to = exprs(AVALC = "No sleeping problems", AVAL = 3)
),
event(
condition = TRUE,
set_values_to = exprs(AVALC = "Missing", AVAL = 99)
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, desc(ADY)),
mode = "first",
set_values_to = exprs(
PARAMCD = "WSP",
PARAM = "Worst Sleeping Problem"
),
keep_source_vars = exprs(everything())
) %>%
select(-STUDYID)
#> # A tibble: 9 × 6
#> USUBJID PARAMCD AVALC ADY AVAL PARAM
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 1 NO SLEEP N 1 NA <NA>
#> 2 1 WAKE UP 3X N 2 NA <NA>
#> 3 2 NO SLEEP N 1 NA <NA>
#> 4 2 WAKE UP 3X Y 2 NA <NA>
#> 5 2 WAKE UP 3X Y 3 NA <NA>
#> 6 3 NO SLEEP <NA> 1 NA <NA>
#> 7 1 WSP No sleeping problems 2 3 Worst Sleeping Problem
#> 8 2 WSP Waking up three times 3 2 Worst Sleeping Problem
#> 9 3 WSP Missing 1 99 Worst Sleeping Problem
Events based on comparison across records (event_joined()
)
We'll now extend the above example. Specifically, we consider a new possible worst sleeping problem, namely if a subject experiences no sleep on consecutive days.
The "consecutive days" portion of the condition requires records to be compared with each other. This is done by using an
event_joined()
object, specifically by passingdataset_name = adqs2
to it so that theadqs2
dataset is joined onto itself. Thecondition
now checks for two no sleep records, and crucially compares theADY
values to see if they differ by one day. The.join
syntax distinguishes between theADY
value of the parent and joined datasets. As the condition involvesAVALC
,PARAMCD
andADY
, we specify these variables withjoin_vars
, and finally, because we wish to compare all records with each other, we selectjoin_type = "all"
.
adqs2 <- tribble(
~USUBJID, ~PARAMCD, ~AVALC, ~ADY,
"4", "WAKE UP", "N", 1,
"4", "NO SLEEP", "Y", 2,
"4", "NO SLEEP", "Y", 3,
"5", "NO SLEEP", "N", 1,
"5", "NO SLEEP", "Y", 2,
"5", "WAKE UP 3X", "Y", 3,
"5", "NO SLEEP", "Y", 4
) %>%
mutate(STUDYID = "AB42")
derive_extreme_event(
adqs2,
by_vars = exprs(STUDYID, USUBJID),
events = list(
event_joined(
join_vars = exprs(AVALC, PARAMCD, ADY),
join_type = "all",
condition = PARAMCD == "NO SLEEP" & AVALC == "Y" &
PARAMCD.join == "NO SLEEP" & AVALC.join == "Y" &
ADY == ADY.join + 1,
set_values_to = exprs(AVALC = "No sleep two nights in a row", AVAL = 0)
),
event(
condition = PARAMCD == "NO SLEEP" & AVALC == "Y",
set_values_to = exprs(AVALC = "No sleep", AVAL = 1)
),
event(
condition = PARAMCD == "WAKE UP 3X" & AVALC == "Y",
set_values_to = exprs(AVALC = "Waking up three times", AVAL = 2)
),
event(
condition = all(AVALC == "N"),
set_values_to = exprs(
AVALC = "No sleeping problems", AVAL = 3
)
),
event(
condition = TRUE,
set_values_to = exprs(AVALC = "Missing", AVAL = 99)
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, desc(ADY)),
mode = "first",
set_values_to = exprs(
PARAMCD = "WSP",
PARAM = "Worst Sleeping Problem"
),
keep_source_vars = exprs(everything())
) %>%
select(-STUDYID)
#> # A tibble: 9 × 6
#> USUBJID PARAMCD AVALC ADY AVAL PARAM
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 4 WAKE UP N 1 NA <NA>
#> 2 4 NO SLEEP Y 2 NA <NA>
#> 3 4 NO SLEEP Y 3 NA <NA>
#> 4 5 NO SLEEP N 1 NA <NA>
#> 5 5 NO SLEEP Y 2 NA <NA>
#> 6 5 WAKE UP 3X Y 3 NA <NA>
#> 7 5 NO SLEEP Y 4 NA <NA>
#> 8 4 WSP No sleep two nights in a row 3 0 Worst Sleeping Pr…
#> 9 5 WSP No sleep 4 1 Worst Sleeping Pr…
Specifying different arguments across event()
objects
Here we consider a Hy's Law use case. We are interested in
knowing whether a subject's Alkaline Phosphatase has ever been
above twice the upper limit of normal range. If so, i.e. if
CRIT1FL
is Y
, we are interested in the record for the first
time this occurs, and if not, we wish to retain the last record.
As such, for this case now we need to vary our usage of the
mode
argument dependent on the event()
.
In first
event()
, since we simply seek the first time thatCRIT1FL
is"Y"
, it's enough to specify thecondition
, because we inheritorder
andmode
from the mainderive_extreme_event()
call here which will automatically select the first occurrence byAVISITN
.In the second
event()
, we select the last record among the full set of records whereCRIT1FL
are all"N"
by additionally specifyingmode = "last"
within theevent()
.Note now the usage of
keep_source_vars = exprs(AVISITN)
rather thaneverything()
as in the previous example. This is done to ensureCRIT1
andCRIT1FL
are not populated for the new records.
adhy <- tribble(
~USUBJID, ~AVISITN, ~CRIT1, ~CRIT1FL,
"1", 1, "ALT > 2 times ULN", "N",
"1", 2, "ALT > 2 times ULN", "N",
"2", 1, "ALT > 2 times ULN", "N",
"2", 2, "ALT > 2 times ULN", "Y",
"2", 3, "ALT > 2 times ULN", "N",
"2", 4, "ALT > 2 times ULN", "Y"
) %>%
mutate(
PARAMCD = "ALT",
PARAM = "ALT (U/L)",
STUDYID = "AB42"
)
derive_extreme_event(
adhy,
by_vars = exprs(STUDYID, USUBJID),
events = list(
event(
condition = CRIT1FL == "Y",
set_values_to = exprs(AVALC = "Y")
),
event(
condition = CRIT1FL == "N",
mode = "last",
set_values_to = exprs(AVALC = "N")
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, AVISITN),
mode = "first",
keep_source_vars = exprs(AVISITN),
set_values_to = exprs(
PARAMCD = "ALT2",
PARAM = "ALT > 2 times ULN"
)
) %>%
select(-STUDYID)
#> # A tibble: 8 × 7
#> USUBJID AVISITN CRIT1 CRIT1FL PARAMCD PARAM AVALC
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 2 1 2 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 3 2 1 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 4 2 2 ALT > 2 times ULN Y ALT ALT (U/L) <NA>
#> 5 2 3 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 6 2 4 ALT > 2 times ULN Y ALT ALT (U/L) <NA>
#> 7 1 2 <NA> <NA> ALT2 ALT > 2 times ULN N
#> 8 2 2 <NA> <NA> ALT2 ALT > 2 times ULN Y
A more complex example: Confirmed Best Overall Response
(first/last_cond_upper
, join_type
, source_datasets
)
The final example showcases a use of derive_extreme_event()
to calculate the Confirmed Best Overall Response (CBOR) in an
ADRS
dataset, as is common in many oncology trials. This example
builds on all the previous ones and thus assumes a baseline level
of confidence with derive_extreme_event()
.
The following ADSL
and ADRS
datasets will be used
throughout:
adsl <- tribble(
~USUBJID, ~TRTSDTC,
"1", "2020-01-01",
"2", "2019-12-12",
"3", "2019-11-11",
"4", "2019-12-30",
"5", "2020-01-01",
"6", "2020-02-02",
"7", "2020-02-02",
"8", "2020-02-01"
) %>%
mutate(
TRTSDT = ymd(TRTSDTC),
STUDYID = "AB42"
)
adrs <- tribble(
~USUBJID, ~ADTC, ~AVALC,
"1", "2020-01-01", "PR",
"1", "2020-02-01", "CR",
"1", "2020-02-16", "NE",
"1", "2020-03-01", "CR",
"1", "2020-04-01", "SD",
"2", "2020-01-01", "SD",
"2", "2020-02-01", "PR",
"2", "2020-03-01", "SD",
"2", "2020-03-13", "CR",
"4", "2020-01-01", "PR",
"4", "2020-03-01", "NE",
"4", "2020-04-01", "NE",
"4", "2020-05-01", "PR",
"5", "2020-01-01", "PR",
"5", "2020-01-10", "PR",
"5", "2020-01-20", "PR",
"6", "2020-02-06", "PR",
"6", "2020-02-16", "CR",
"6", "2020-03-30", "PR",
"7", "2020-02-06", "PR",
"7", "2020-02-16", "CR",
"7", "2020-04-01", "NE",
"8", "2020-02-16", "PD"
) %>%
mutate(
ADT = ymd(ADTC),
STUDYID = "AB42",
PARAMCD = "OVR",
PARAM = "Overall Response by Investigator"
) %>%
derive_vars_merged(
dataset_add = adsl,
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTSDT)
)
Since the CBOR derivation contains multiple complex parts, it's
convenient to make use of the description
argument within each event object
to describe what condition is being checked.
For the Confirmed Response (CR), for each
"CR"
record in the originalADRS
dataset that will be identified by the first part of thecondition
argument (AVALC == "CR"
), we need to use thefirst_cond_upper
argument to limit the group of observations to consider alongside it. Namely, we need to look up to and including the second CR (AVALC.join == "CR"
) over 28 days from the first one (ADT.join >= ADT + 28
). The observations satisfyingfirst_cond_upper
then form part of our "join group", meaning that the remaining portions ofcondition
which reference joined variables are limited to this group. In particular, withincondition
we useall()
to check that all observations are either"CR"
or"NE"
, andcount_vals()
to ensure at most one is"NE"
.Note that the selection of
join_type = "after"
is critical here, due to the fact that the restriction implied byjoin_type
is applied before the one implied byfirst_cond_upper
. Picking the first subject (who was correctly identified as a confirmed responder) as an example, selectingjoin_type = "all"
instead of"after"
would mean the first"PR"
record from"2020-01-01"
would also be considered when evaluating theall(AVALC.join %in% c("CR", "NE"))
portion ofcondition
. In turn, the condition would not be satisfied anymore, and in this case, following the later event logic shows the subject would be considered a partial responder instead.The Partial Response (PR), is very similar; with the difference being that the first portion of
condition
now references"PR"
andfirst_cond_upper
accepts a confirmatory"PR"
or"CR"
28 days later. Note that now we must add"PR"
as an option within theall()
condition to account for confirmatory"PR"
s.The Stable Disease (SD), Progressive Disease (PD) and Not Evaluable (NE) events are simpler and just require
event()
calls.Finally, we use a catch-all
event()
withcondition = TRUE
anddataset_name = "adsl"
to identify those subjects who do not appear inADRS
and list their CBOR as"MISSING"
. Note here the fact thatdataset_name
is set to"adsl"
, which is a new source dataset. As such it's important in the mainderive_extreme_event()
call to listadsl
as another source dataset withsource_datasets = list(adsl = adsl)
.
derive_extreme_event(
adrs,
by_vars = exprs(STUDYID, USUBJID),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, ADT),
mode = "first",
source_datasets = list(adsl = adsl),
events = list(
event_joined(
description = paste(
"CR needs to be confirmed by a second CR at least 28 days later",
"at most one NE is acceptable between the two assessments"
),
join_vars = exprs(AVALC, ADT),
join_type = "after",
first_cond_upper = AVALC.join == "CR" & ADT.join >= ADT + 28,
condition = AVALC == "CR" &
all(AVALC.join %in% c("CR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1,
set_values_to = exprs(AVALC = "CR")
),
event_joined(
description = paste(
"PR needs to be confirmed by a second CR or PR at least 28 days later,",
"at most one NE is acceptable between the two assessments"
),
join_vars = exprs(AVALC, ADT),
join_type = "after",
first_cond_upper = AVALC.join %in% c("CR", "PR") & ADT.join >= ADT + 28,
condition = AVALC == "PR" &
all(AVALC.join %in% c("CR", "PR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1,
set_values_to = exprs(AVALC = "PR")
),
event(
description = paste(
"CR, PR, or SD are considered as SD if occurring at least 28",
"after treatment start"
),
condition = AVALC %in% c("CR", "PR", "SD") & ADT >= TRTSDT + 28,
set_values_to = exprs(AVALC = "SD")
),
event(
condition = AVALC == "PD",
set_values_to = exprs(AVALC = "PD")
),
event(
condition = AVALC %in% c("CR", "PR", "SD", "NE"),
set_values_to = exprs(AVALC = "NE")
),
event(
description = "Set response to MISSING for patients without records in ADRS",
dataset_name = "adsl",
condition = TRUE,
set_values_to = exprs(AVALC = "MISSING"),
keep_source_vars = exprs(TRTSDT)
)
),
set_values_to = exprs(
PARAMCD = "CBOR",
PARAM = "Best Confirmed Overall Response by Investigator"
)
) %>%
filter(PARAMCD == "CBOR") %>%
select(-STUDYID, -ADTC)
#> # A tibble: 8 × 6
#> USUBJID AVALC ADT PARAMCD PARAM TRTSDT
#> <chr> <chr> <date> <chr> <chr> <date>
#> 1 1 CR 2020-02-01 CBOR Best Confirmed Overall Response… 2020-01-01
#> 2 2 SD 2020-02-01 CBOR Best Confirmed Overall Response… 2019-12-12
#> 3 3 MISSING NA CBOR Best Confirmed Overall Response… 2019-11-11
#> 4 4 SD 2020-05-01 CBOR Best Confirmed Overall Response… 2019-12-30
#> 5 5 NE 2020-01-01 CBOR Best Confirmed Overall Response… 2020-01-01
#> 6 6 PR 2020-02-06 CBOR Best Confirmed Overall Response… 2020-02-02
#> 7 7 NE 2020-02-06 CBOR Best Confirmed Overall Response… 2020-02-02
#> 8 8 PD 2020-02-16 CBOR Best Confirmed Overall Response… 2020-02-01
Further examples
Equivalent examples for using thecheck_type
argument can be found in
derive_extreme_records()
.