Add the First or Last Observation for Each By Group as New Records
Source:R/derive_extreme_records.R
derive_extreme_records.Rd
Add the first or last observation for each by group as new observations. The
new observations can be selected from the additional dataset. This function can
be used for adding the maximum or minimum value as a separate visit.
All variables of the selected observation are kept. This distinguishes
derive_extreme_records()
from derive_summary_records()
,
where only the by variables are populated for the new records.
Usage
derive_extreme_records(
dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars = NULL,
order = NULL,
mode = NULL,
filter_add = NULL,
check_type = "warning",
exist_flag = NULL,
true_value = "Y",
false_value = NA_character_,
keep_source_vars = exprs(everything()),
set_values_to
)
Arguments
- dataset
Input dataset
- dataset_add
Additional dataset
The additional dataset, which determines the by groups returned in the input dataset, based on the groups that exist in this dataset after being subset by
filter_add
.The variables specified in the
by_vars
andfilter_add
parameters are expected in this dataset. Ifmode
andorder
are specified, the first or last observation within each by group, defined byby_vars
, is selected.- dataset_ref
Reference dataset
The variables specified for
by_vars
are expected. For each observation of the specified dataset a new observation is added to the input dataset.- by_vars
Grouping variables
If
dataset_ref
is specified, this argument must be specified.Permitted Values: list of variables created by
exprs()
e.g.exprs(USUBJID, VISIT)
- order
Sort order
Within each by group the observations are ordered by the specified order.
Permitted Values: list of expressions created by
exprs()
, e.g.,exprs(ADT, desc(AVAL))
- mode
Selection mode (first or last)
If
"first"
is specified, the first observation of each by group is added to the input dataset. If"last"
is specified, the last observation of each by group is added to the input dataset.Permitted Values:
"first"
,"last"
- filter_add
Filter for additional dataset (
dataset_add
)Only observations in
dataset_add
fulfilling the specified condition are considered.- check_type
Check uniqueness?
If
"warning"
or"error"
is specified, the specified message is issued if the observations of the (restricted) additional dataset are not unique with respect to the by variables and the order.Permitted Values:
"none"
,"warning"
,"error"
- exist_flag
Existence flag
The specified variable is added to the output dataset.
For by groups with at least one observation in the additional dataset (
dataset_add
)exist_flag
is set to the value specified by thetrue_value
argument.For all other by groups
exist_flag
is set to the value specified by thefalse_value
argument.Permitted Values: Variable name
- true_value
True value
For new observations selected from the additional dataset (
dataset_add
),exist_flag
is set to the specified value.- false_value
False value
For new observations not selected from the additional dataset (
dataset_add
),exist_flag
is set to the specified value.- keep_source_vars
Variables to be kept in the new records
A named list or tidyselect expressions created by
exprs()
defining the variables to be kept for the new records. The variables specified forby_vars
andset_values_to
need not be specified here as they are kept automatically.- set_values_to
Variables to be set
The specified variables are set to the specified values for the new observations.
Set a list of variables to some specified value for the new records
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA. If summary functions are used, the values are summarized by the variables specified for
by_vars
.
For example:
Value
The input dataset with the first or last observation of each by group added as new observations.
Details
The additional dataset (
dataset_add
) is restricted as specified by thefilter_add
argument.For each group (with respect to the variables specified for the
by_vars
argument) the first or last observation (with respect to the order specified for theorder
argument and the mode specified for themode
argument) is selected.If
dataset_ref
is specified, observations which are indataset_ref
but not in the selected records are added.The variables specified by the
set_values_to
argument are added to the selected observations.The variables specified by the
keep_source_vars
argument are selected along with the variables specified inby_vars
andset_values_to
arguments.The observations are added to input dataset.
See also
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_event()
,
derive_locf_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()
,
derive_summary_records()
Examples
library(tibble)
library(dplyr, warn.conflicts = FALSE)
library(lubridate)
adlb <- tribble(
~USUBJID, ~AVISITN, ~AVAL, ~LBSEQ,
"1", 1, 113, 1,
"1", 2, 113, 2,
"1", 3, 117, 3,
"2", 1, 101, 1,
"2", 2, 101, 2,
"2", 3, 95, 3
)
# Add a new record for each USUBJID storing the minimum value (first AVAL).
# If multiple records meet the minimum criterion, take the first value by
# AVISITN. Set AVISITN = 97 and DTYPE = MINIMUM for these new records.
# Specify the variables that need to be kept in the new records.
derive_extreme_records(
adlb,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(AVAL, AVISITN),
mode = "first",
filter_add = !is.na(AVAL),
keep_source_vars = exprs(AVAL),
set_values_to = exprs(
AVISITN = 97,
DTYPE = "MINIMUM"
)
)
#> # A tibble: 8 × 5
#> USUBJID AVISITN AVAL LBSEQ DTYPE
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 113 1 NA
#> 2 1 2 113 2 NA
#> 3 1 3 117 3 NA
#> 4 2 1 101 1 NA
#> 5 2 2 101 2 NA
#> 6 2 3 95 3 NA
#> 7 1 97 113 NA MINIMUM
#> 8 2 97 95 NA MINIMUM
# Add a new record for each USUBJID storing the maximum value (last AVAL).
# If multiple records meet the maximum criterion, take the first value by
# AVISITN. Set AVISITN = 98 and DTYPE = MAXIMUM for these new records.
derive_extreme_records(
adlb,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(desc(AVAL), AVISITN),
mode = "first",
filter_add = !is.na(AVAL),
set_values_to = exprs(
AVISITN = 98,
DTYPE = "MAXIMUM"
)
)
#> # A tibble: 8 × 5
#> USUBJID AVISITN AVAL LBSEQ DTYPE
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 113 1 NA
#> 2 1 2 113 2 NA
#> 3 1 3 117 3 NA
#> 4 2 1 101 1 NA
#> 5 2 2 101 2 NA
#> 6 2 3 95 3 NA
#> 7 1 98 117 3 MAXIMUM
#> 8 2 98 101 1 MAXIMUM
# Add a new record for each USUBJID storing for the last value.
# Set AVISITN = 99 and DTYPE = LOV for these new records.
derive_extreme_records(
adlb,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(AVISITN),
mode = "last",
set_values_to = exprs(
AVISITN = 99,
DTYPE = "LOV"
)
)
#> # A tibble: 8 × 5
#> USUBJID AVISITN AVAL LBSEQ DTYPE
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 1 1 113 1 NA
#> 2 1 2 113 2 NA
#> 3 1 3 117 3 NA
#> 4 2 1 101 1 NA
#> 5 2 2 101 2 NA
#> 6 2 3 95 3 NA
#> 7 1 99 117 3 LOV
#> 8 2 99 95 3 LOV
# Derive a new parameter for the first disease progression (PD)
adsl <- tribble(
~USUBJID, ~DTHDT,
"1", ymd("2022-05-13"),
"2", ymd(""),
"3", ymd("")
) %>%
mutate(STUDYID = "XX1234")
adrs <- tribble(
~USUBJID, ~ADTC, ~AVALC,
"1", "2020-01-02", "PR",
"1", "2020-02-01", "CR",
"1", "2020-03-01", "CR",
"1", "2020-04-01", "SD",
"2", "2021-06-15", "SD",
"2", "2021-07-16", "PD",
"2", "2021-09-14", "PD"
) %>%
mutate(
STUDYID = "XX1234",
ADT = ymd(ADTC),
PARAMCD = "OVR",
PARAM = "Overall Response",
ANL01FL = "Y"
) %>%
select(-ADTC)
derive_extreme_records(
adrs,
dataset_ref = adsl,
dataset_add = adrs,
by_vars = exprs(STUDYID, USUBJID),
filter_add = PARAMCD == "OVR" & AVALC == "PD",
order = exprs(ADT),
exist_flag = AVALC,
true_value = "Y",
false_value = "N",
mode = "first",
set_values_to = exprs(
PARAMCD = "PD",
PARAM = "Disease Progression",
AVAL = yn_to_numeric(AVALC),
ANL01FL = "Y",
ADT = ADT
)
)
#> # A tibble: 10 × 8
#> USUBJID AVALC STUDYID ADT PARAMCD PARAM ANL01FL AVAL
#> <chr> <chr> <chr> <date> <chr> <chr> <chr> <dbl>
#> 1 1 PR XX1234 2020-01-02 OVR Overall Response Y NA
#> 2 1 CR XX1234 2020-02-01 OVR Overall Response Y NA
#> 3 1 CR XX1234 2020-03-01 OVR Overall Response Y NA
#> 4 1 SD XX1234 2020-04-01 OVR Overall Response Y NA
#> 5 2 SD XX1234 2021-06-15 OVR Overall Response Y NA
#> 6 2 PD XX1234 2021-07-16 OVR Overall Response Y NA
#> 7 2 PD XX1234 2021-09-14 OVR Overall Response Y NA
#> 8 2 Y XX1234 2021-07-16 PD Disease Progression Y 1
#> 9 1 N XX1234 NA PD Disease Progression Y 0
#> 10 3 N XX1234 NA PD Disease Progression Y 0
# derive parameter indicating death
derive_extreme_records(
dataset_ref = adsl,
dataset_add = adsl,
by_vars = exprs(STUDYID, USUBJID),
filter_add = !is.na(DTHDT),
exist_flag = AVALC,
true_value = "Y",
false_value = "N",
mode = "first",
set_values_to = exprs(
PARAMCD = "DEATH",
PARAM = "Death",
ANL01FL = "Y",
ADT = DTHDT
)
)
#> # A tibble: 3 × 8
#> STUDYID USUBJID PARAMCD PARAM ANL01FL ADT DTHDT AVALC
#> <chr> <chr> <chr> <chr> <chr> <date> <date> <chr>
#> 1 XX1234 1 DEATH Death Y 2022-05-13 2022-05-13 Y
#> 2 XX1234 2 DEATH Death Y NA NA N
#> 3 XX1234 3 DEATH Death Y NA NA N