
Add the First or Last Observation for Each By Group as New Records
Source:R/derive_extreme_records.R
derive_extreme_records.RdAdd the first or last observation for each by group as new observations. The
new observations can be selected from the additional dataset. This function can
be used for adding the maximum or minimum value as a separate visit.
All variables of the selected observation are kept. This distinguishes
derive_extreme_records() from derive_summary_records(),
where only the by variables are populated for the new records.
Usage
derive_extreme_records(
dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars = NULL,
order = NULL,
mode = NULL,
filter_add = NULL,
check_type = "warning",
exist_flag = NULL,
true_value = "Y",
false_value = NA_character_,
keep_source_vars = exprs(everything()),
set_values_to
)Arguments
- dataset
Input dataset
- dataset_add
-
Additional dataset
The additional dataset, which determines the by groups returned in the input dataset, based on the groups that exist in this dataset after being subset by
filter_add.The variables specified in the
by_varsandfilter_addparameters are expected in this dataset. Ifmodeandorderare specified, the first or last observation within each by group, defined byby_vars, is selected. - dataset_ref
-
Reference dataset
The variables specified for
by_varsare expected. For each observation of the specified dataset a new observation is added to the input dataset. - by_vars
-
Grouping variables
If
dataset_refis specified, this argument must be specified.Permitted Values: list of variables created by
exprs()e.g.exprs(USUBJID, VISIT) - order
-
Sort order
Within each by group the observations are ordered by the specified order.
Permitted Values: list of expressions created by
exprs(), e.g.,exprs(ADT, desc(AVAL)) - mode
-
Selection mode (first or last)
If
"first"is specified, the first observation of each by group is added to the input dataset. If"last"is specified, the last observation of each by group is added to the input dataset.Permitted Values:
"first","last" - filter_add
-
Filter for additional dataset (
dataset_add)Only observations in
dataset_addfulfilling the specified condition are considered. - check_type
-
Check uniqueness?
If
"warning"or"error"is specified, the specified message is issued if the observations of the (restricted) additional dataset are not unique with respect to the by variables and the order.Permitted Values:
"none","warning","error" - exist_flag
-
Existence flag
The specified variable is added to the output dataset.
For by groups with at least one observation in the additional dataset (
dataset_add)exist_flagis set to the value specified by thetrue_valueargument.For all other by groups
exist_flagis set to the value specified by thefalse_valueargument.Permitted Values: Variable name
- true_value
-
True value
For new observations selected from the additional dataset (
dataset_add),exist_flagis set to the specified value. - false_value
-
False value
For new observations not selected from the additional dataset (
dataset_add),exist_flagis set to the specified value. - keep_source_vars
-
Variables to be kept in the new records
A named list or tidyselect expressions created by
exprs()defining the variables to be kept for the new records. The variables specified forby_varsandset_values_toneed not be specified here as they are kept automatically. - set_values_to
-
Variables to be set
The specified variables are set to the specified values for the new observations.
Set a list of variables to some specified value for the new records
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA. If summary functions are used, the values are summarized by the variables specified for
by_vars.
For example:
Value
The input dataset with the first or last observation of each by group added as new observations.
Details
The additional dataset (
dataset_add) is restricted as specified by thefilter_addargument.For each group (with respect to the variables specified for the
by_varsargument) the first or last observation (with respect to the order specified for theorderargument and the mode specified for themodeargument) is selected.If
dataset_refis specified, observations which are indataset_refbut not in the selected records are added.The variables specified by the
set_values_toargument are added to the selected observations.The variables specified by the
keep_source_varsargument are selected along with the variables specified inby_varsandset_values_toarguments.The observations are added to input dataset.
See also
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd(),
derive_expected_records(),
derive_extreme_event(),
derive_locf_records(),
derive_param_bmi(),
derive_param_bsa(),
derive_param_computed(),
derive_param_doseint(),
derive_param_exist_flag(),
derive_param_exposure(),
derive_param_framingham(),
derive_param_map(),
derive_param_qtc(),
derive_param_rr(),
derive_param_wbc_abs(),
derive_summary_records()
Examples
library(tibble)
library(dplyr, warn.conflicts = FALSE)
library(lubridate)
adlb <- tribble(
~USUBJID, ~AVISITN, ~AVAL, ~LBSEQ,
"1", 1, 113, 1,
"1", 2, 113, 2,
"1", 3, 117, 3,
"2", 1, 101, 1,
"2", 2, 101, 2,
"2", 3, 95, 3
)
# Add a new record for each USUBJID storing the minimum value (first AVAL).
# If multiple records meet the minimum criterion, take the first value by
# AVISITN. Set AVISITN = 97 and DTYPE = MINIMUM for these new records.
# Specify the variables that need to be kept in the new records.
derive_extreme_records(
adlb,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(AVAL, AVISITN),
mode = "first",
filter_add = !is.na(AVAL),
keep_source_vars = exprs(AVAL),
set_values_to = exprs(
AVISITN = 97,
DTYPE = "MINIMUM"
)
)
#> # A tibble: 8 × 5
#> USUBJID AVISITN AVAL LBSEQ DTYPE
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 113 1 NA
#> 2 1 2 113 2 NA
#> 3 1 3 117 3 NA
#> 4 2 1 101 1 NA
#> 5 2 2 101 2 NA
#> 6 2 3 95 3 NA
#> 7 1 97 113 NA MINIMUM
#> 8 2 97 95 NA MINIMUM
# Add a new record for each USUBJID storing the maximum value (last AVAL).
# If multiple records meet the maximum criterion, take the first value by
# AVISITN. Set AVISITN = 98 and DTYPE = MAXIMUM for these new records.
derive_extreme_records(
adlb,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(desc(AVAL), AVISITN),
mode = "first",
filter_add = !is.na(AVAL),
set_values_to = exprs(
AVISITN = 98,
DTYPE = "MAXIMUM"
)
)
#> # A tibble: 8 × 5
#> USUBJID AVISITN AVAL LBSEQ DTYPE
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 113 1 NA
#> 2 1 2 113 2 NA
#> 3 1 3 117 3 NA
#> 4 2 1 101 1 NA
#> 5 2 2 101 2 NA
#> 6 2 3 95 3 NA
#> 7 1 98 117 3 MAXIMUM
#> 8 2 98 101 1 MAXIMUM
# Add a new record for each USUBJID storing for the last value.
# Set AVISITN = 99 and DTYPE = LOV for these new records.
derive_extreme_records(
adlb,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(AVISITN),
mode = "last",
set_values_to = exprs(
AVISITN = 99,
DTYPE = "LOV"
)
)
#> # A tibble: 8 × 5
#> USUBJID AVISITN AVAL LBSEQ DTYPE
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 113 1 NA
#> 2 1 2 113 2 NA
#> 3 1 3 117 3 NA
#> 4 2 1 101 1 NA
#> 5 2 2 101 2 NA
#> 6 2 3 95 3 NA
#> 7 1 99 117 3 LOV
#> 8 2 99 95 3 LOV
# Derive a new parameter for the first disease progression (PD)
adsl <- tribble(
~USUBJID, ~DTHDT,
"1", ymd("2022-05-13"),
"2", ymd(""),
"3", ymd("")
) %>%
mutate(STUDYID = "XX1234")
adrs <- tribble(
~USUBJID, ~ADTC, ~AVALC,
"1", "2020-01-02", "PR",
"1", "2020-02-01", "CR",
"1", "2020-03-01", "CR",
"1", "2020-04-01", "SD",
"2", "2021-06-15", "SD",
"2", "2021-07-16", "PD",
"2", "2021-09-14", "PD"
) %>%
mutate(
STUDYID = "XX1234",
ADT = ymd(ADTC),
PARAMCD = "OVR",
PARAM = "Overall Response",
ANL01FL = "Y"
) %>%
select(-ADTC)
derive_extreme_records(
adrs,
dataset_ref = adsl,
dataset_add = adrs,
by_vars = exprs(STUDYID, USUBJID),
filter_add = PARAMCD == "OVR" & AVALC == "PD",
order = exprs(ADT),
exist_flag = AVALC,
true_value = "Y",
false_value = "N",
mode = "first",
set_values_to = exprs(
PARAMCD = "PD",
PARAM = "Disease Progression",
AVAL = yn_to_numeric(AVALC),
ANL01FL = "Y",
ADT = ADT
)
)
#> # A tibble: 10 × 8
#> USUBJID AVALC STUDYID ADT PARAMCD PARAM ANL01FL AVAL
#> <chr> <chr> <chr> <date> <chr> <chr> <chr> <dbl>
#> 1 1 PR XX1234 2020-01-02 OVR Overall Response Y NA
#> 2 1 CR XX1234 2020-02-01 OVR Overall Response Y NA
#> 3 1 CR XX1234 2020-03-01 OVR Overall Response Y NA
#> 4 1 SD XX1234 2020-04-01 OVR Overall Response Y NA
#> 5 2 SD XX1234 2021-06-15 OVR Overall Response Y NA
#> 6 2 PD XX1234 2021-07-16 OVR Overall Response Y NA
#> 7 2 PD XX1234 2021-09-14 OVR Overall Response Y NA
#> 8 2 Y XX1234 2021-07-16 PD Disease Progression Y 1
#> 9 1 N XX1234 NA PD Disease Progression Y 0
#> 10 3 N XX1234 NA PD Disease Progression Y 0
# derive parameter indicating death
derive_extreme_records(
dataset_ref = adsl,
dataset_add = adsl,
by_vars = exprs(STUDYID, USUBJID),
filter_add = !is.na(DTHDT),
exist_flag = AVALC,
true_value = "Y",
false_value = "N",
mode = "first",
set_values_to = exprs(
PARAMCD = "DEATH",
PARAM = "Death",
ANL01FL = "Y",
ADT = DTHDT
)
)
#> # A tibble: 3 × 8
#> STUDYID USUBJID PARAMCD PARAM ANL01FL ADT DTHDT AVALC
#> <chr> <chr> <chr> <chr> <chr> <date> <date> <chr>
#> 1 XX1234 1 DEATH Death Y 2022-05-13 2022-05-13 Y
#> 2 XX1234 2 DEATH Death Y NA NA N
#> 3 XX1234 3 DEATH Death Y NA NA N