Derives a Flag Based on an Existing Flag
Source:R/derive_var_joined_exist_flag.R
derive_var_joined_exist_flag.Rd
Derive a flag which depends on other observations of the dataset. For example, flagging events which need to be confirmed by a second event.
Usage
derive_var_joined_exist_flag(
dataset,
dataset_add,
by_vars,
order,
new_var,
tmp_obs_nr_var = NULL,
join_vars,
join_type,
first_cond_lower = NULL,
first_cond_upper = NULL,
filter_add = NULL,
filter_join,
true_value = "Y",
false_value = NA_character_,
check_type = "warning"
)
Arguments
- dataset
Input dataset
The variables specified by the
by_vars
andjoin_vars
arguments are expected to be in the dataset.- dataset_add
Additional dataset
The variables specified for
by_vars
,join_vars
, andorder
are expected.- by_vars
Grouping variables
The specified variables are used for joining the input dataset (
dataset
) with the additional dataset (dataset_add
).Permitted Values: list of variables created by
exprs()
e.g.exprs(USUBJID, VISIT)
- order
Order
The observations are ordered by the specified order.
For handling of
NA
s in sorting variables see Sort Order.- new_var
New variable
The specified variable is added to the input dataset.
- tmp_obs_nr_var
Temporary observation number
The specified variable is added to the input dataset (
dataset
) and the additional dataset (dataset_add
). It is set to the observation number with respect toorder
. For each by group (by_vars
) the observation number starts with1
. The variable can be used in the conditions (filter_join
,first_cond_upper
,first_cond_lower
). It is not included in the output dataset. It can also be used to flag consecutive observations or the last observation (see last example below).- join_vars
Variables to keep from joined dataset
The variables needed from the other observations should be specified for this parameter. The specified variables are added to the joined dataset with suffix ".join". For example to flag all observations with
AVALC == "Y"
andAVALC == "Y"
for at least one subsequent visitjoin_vars = exprs(AVALC, AVISITN)
andfilter_join = AVALC == "Y" & AVALC.join == "Y" & AVISITN < AVISITN.join
could be specified.The
*.join
variables are not included in the output dataset.- join_type
Observations to keep after joining
The argument determines which of the joined observations are kept with respect to the original observation. For example, if
join_type = "after"
is specified all observations after the original observations are kept.For example for confirmed response or BOR in the oncology setting or confirmed deterioration in questionnaires the confirmatory assessment must be after the assessment. Thus
join_type = "after"
could be used.Whereas, sometimes you might allow for confirmatory observations to occur prior to the observation. For example, to identify AEs occurring on or after seven days before a COVID AE. Thus
join_type = "all"
could be used.Permitted Values:
"before"
,"after"
,"all"
- first_cond_lower
Condition for selecting range of data (before)
If this argument is specified, the other observations are restricted from the first observation before the current observation where the specified condition is fulfilled up to the current observation. If the condition is not fulfilled for any of the other observations, no observations are considered, i.e., the observation is not flagged.
This parameter should be specified if
filter_join
contains summary functions which should not apply to all observations but only from a certain observation before the current observation up to the current observation. For an example see the last example below.- first_cond_upper
Condition for selecting range of data (after)
If this argument is specified, the other observations are restricted up to the first observation where the specified condition is fulfilled. If the condition is not fulfilled for any of the other observations, no observations are considered, i.e., the observation is not flagged.
This parameter should be specified if
filter_join
contains summary functions which should not apply to all observations but only up to the confirmation assessment. For an example see the third example below.- filter_add
Filter for additional dataset (
dataset_add
)Only observations from
dataset_add
fulfilling the specified condition are joined to the input dataset. If the argument is not specified, all observations are joined.Variables created by
order
ornew_vars
arguments can be used in the condition.The condition can include summary functions like
all()
orany()
. The additional dataset is grouped by the by variables (by_vars
).Permitted Values: a condition
- filter_join
Condition for selecting observations
The filter is applied to the joined dataset for flagging the confirmed observations. The condition can include summary functions like
all()
orany()
. The joined dataset is grouped by the original observations. I.e., the summary function are applied to all observations up to the confirmation observation. For example,filter_join = AVALC == "CR" & all(AVALC.join %in% c("CR", "NE")) & count_vals(var = AVALC.join, val = "NE") <= 1
selects observations with response "CR" and for all observations up to the confirmation observation the response is "CR" or "NE" and there is at most one "NE".- true_value
Value of
new_var
for flagged observations- false_value
Value of
new_var
for observations not flagged- check_type
Check uniqueness?
If
"warning"
or"error"
is specified, the specified message is issued if the observations of the input dataset are not unique with respect to the by variables and the order.Permitted Values:
"none"
,"warning"
,"error"
Details
An example usage might be flagging if a patient received two required medications within a certain timeframe of each other.
In the oncology setting, for example, the function could be used to flag if a response value can be confirmed by an other assessment. This is commonly used in endpoints such as best overall response.
The following steps are performed to produce the output dataset.
Step 1
The variables specified by
order
are added to the additional dataset (dataset_add
).The variables specified by
join_vars
are added to the additional dataset (dataset_add
).The records from the additional dataset (
dataset_add
) are restricted to those matching thefilter_add
condition.
The input dataset (dataset
) is joined with the restricted additional
dataset by the variables specified for by_vars
. From the additional
dataset only the variables specified for join_vars
are kept. The suffix
".join" is added to those variables which also exist in the input dataset.
For example, for by_vars = USUBJID
, join_vars = exprs(AVISITN, AVALC)
and input dataset and additional dataset
# A tibble: 2 x 4
USUBJID AVISITN AVALC AVAL
<chr> <dbl> <chr> <dbl>
1 1 Y 1
1 2 N 0
the joined dataset is
A tibble: 4 x 6
USUBJID AVISITN AVALC AVAL AVISITN.join AVALC.join
<chr> <dbl> <chr> <dbl> <dbl> <chr>
1 1 Y 1 1 Y
1 1 Y 1 2 N
1 2 N 0 1 Y
1 2 N 0 2 N
Step 2
The joined dataset is restricted to observations with respect to
join_type
and order
.
The dataset from the example in the previous step with join_type = "after"
and order = exprs(AVISITN)
is restricted to
A tibble: 4 x 6
USUBJID AVISITN AVALC AVAL AVISITN.join AVALC.join
<chr> <dbl> <chr> <dbl> <dbl> <chr>
1 1 Y 1 2 N
Step 3
If first_cond_lower
is specified, for each observation of the input
dataset the joined dataset is restricted to observations from the first
observation where first_cond_lower
is fulfilled (the observation
fulfilling the condition is included) up to the observation of the input
dataset. If for an observation of the input dataset the condition is not
fulfilled, the observation is removed.
If first_cond_upper
is specified, for each observation of the input
dataset the joined dataset is restricted to observations up to the first
observation where first_cond_upper
is fulfilled (the observation
fulfilling the condition is included). If for an observation of the input
dataset the condition is not fulfilled, the observation is removed.
For an example see the last example in the "Examples" section.
Step 4
The joined dataset is grouped by the observations from the input dataset
and restricted to the observations fulfilling the condition specified by
filter_join
.
Step 6
The variable specified by new_var
is added to the input dataset. It is
set to true_value
for all observations which were selected in the
previous step. For the other observations it is set to false_value
.
Note: This function creates temporary datasets which may be much bigger
than the input datasets. If this causes memory issues, please try setting
the admiral option save_memory
to TRUE
(see set_admiral_options()
).
This reduces the memory consumption but increases the run-time.
See also
filter_joined()
, derive_vars_joined()
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_merged_ef_msrc()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_cat()
,
derive_vars_computed()
,
derive_vars_joined()
,
derive_vars_merged()
,
derive_vars_merged_lookup()
,
derive_vars_transposed()
Examples
library(tibble)
# flag observations with a duration longer than 30 and
# at, after, or up to 7 days before a COVID AE (ACOVFL == "Y")
adae <- tribble(
~USUBJID, ~ADY, ~ACOVFL, ~ADURN,
"1", 10, "N", 1,
"1", 21, "N", 50,
"1", 23, "Y", 14,
"1", 32, "N", 31,
"1", 42, "N", 20,
"2", 11, "Y", 13,
"2", 23, "N", 2,
"3", 13, "Y", 12,
"4", 14, "N", 32,
"4", 21, "N", 41
)
derive_var_joined_exist_flag(
adae,
dataset_add = adae,
new_var = ALCOVFL,
by_vars = exprs(USUBJID),
join_vars = exprs(ACOVFL, ADY),
join_type = "all",
order = exprs(ADY),
filter_join = ADURN > 30 & ACOVFL.join == "Y" & ADY >= ADY.join - 7
)
#> # A tibble: 10 × 5
#> USUBJID ADY ACOVFL ADURN ALCOVFL
#> <chr> <dbl> <chr> <dbl> <chr>
#> 1 1 10 N 1 NA
#> 2 1 21 N 50 Y
#> 3 1 23 Y 14 NA
#> 4 1 32 N 31 Y
#> 5 1 42 N 20 NA
#> 6 2 11 Y 13 NA
#> 7 2 23 N 2 NA
#> 8 3 13 Y 12 NA
#> 9 4 14 N 32 NA
#> 10 4 21 N 41 NA
# flag observations with AVALC == "Y" and AVALC == "Y" at one subsequent visit
data <- tribble(
~USUBJID, ~AVISITN, ~AVALC,
"1", 1, "Y",
"1", 2, "N",
"1", 3, "Y",
"1", 4, "N",
"2", 1, "Y",
"2", 2, "N",
"3", 1, "Y",
"4", 1, "N",
"4", 2, "N",
)
derive_var_joined_exist_flag(
data,
dataset_add = data,
by_vars = exprs(USUBJID),
new_var = CONFFL,
join_vars = exprs(AVALC, AVISITN),
join_type = "after",
order = exprs(AVISITN),
filter_join = AVALC == "Y" & AVALC.join == "Y" & AVISITN < AVISITN.join
)
#> # A tibble: 9 × 4
#> USUBJID AVISITN AVALC CONFFL
#> <chr> <dbl> <chr> <chr>
#> 1 1 1 Y Y
#> 2 1 2 N NA
#> 3 1 3 Y NA
#> 4 1 4 N NA
#> 5 2 1 Y NA
#> 6 2 2 N NA
#> 7 3 1 Y NA
#> 8 4 1 N NA
#> 9 4 2 N NA
# select observations with AVALC == "CR", AVALC == "CR" at a subsequent visit,
# only "CR" or "NE" in between, and at most one "NE" in between
data <- tribble(
~USUBJID, ~AVISITN, ~AVALC,
"1", 1, "PR",
"1", 2, "CR",
"1", 3, "NE",
"1", 4, "CR",
"1", 5, "NE",
"2", 1, "CR",
"2", 2, "PR",
"2", 3, "CR",
"3", 1, "CR",
"4", 1, "CR",
"4", 2, "NE",
"4", 3, "NE",
"4", 4, "CR",
"4", 5, "PR"
)
derive_var_joined_exist_flag(
data,
dataset_add = data,
by_vars = exprs(USUBJID),
join_vars = exprs(AVALC),
join_type = "after",
order = exprs(AVISITN),
new_var = CONFFL,
first_cond_upper = AVALC.join == "CR",
filter_join = AVALC == "CR" & all(AVALC.join %in% c("CR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1
)
#> # A tibble: 14 × 4
#> USUBJID AVISITN AVALC CONFFL
#> <chr> <dbl> <chr> <chr>
#> 1 1 1 PR NA
#> 2 1 2 CR Y
#> 3 1 3 NE NA
#> 4 1 4 CR NA
#> 5 1 5 NE NA
#> 6 2 1 CR NA
#> 7 2 2 PR NA
#> 8 2 3 CR NA
#> 9 3 1 CR NA
#> 10 4 1 CR NA
#> 11 4 2 NE NA
#> 12 4 3 NE NA
#> 13 4 4 CR NA
#> 14 4 5 PR NA
# flag observations with AVALC == "PR", AVALC == "CR" or AVALC == "PR"
# at a subsequent visit at least 20 days later, only "CR", "PR", or "NE"
# in between, at most one "NE" in between, and "CR" is not followed by "PR"
data <- tribble(
~USUBJID, ~ADY, ~AVALC,
"1", 6, "PR",
"1", 12, "CR",
"1", 24, "NE",
"1", 32, "CR",
"1", 48, "PR",
"2", 3, "PR",
"2", 21, "CR",
"2", 33, "PR",
"3", 11, "PR",
"4", 7, "PR",
"4", 12, "NE",
"4", 24, "NE",
"4", 32, "PR",
"4", 55, "PR"
)
derive_var_joined_exist_flag(
data,
dataset_add = data,
by_vars = exprs(USUBJID),
join_vars = exprs(AVALC, ADY),
join_type = "after",
order = exprs(ADY),
new_var = CONFFL,
first_cond_upper = AVALC.join %in% c("CR", "PR") & ADY.join - ADY >= 20,
filter_join = AVALC == "PR" &
all(AVALC.join %in% c("CR", "PR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1 &
(
min_cond(var = ADY.join, cond = AVALC.join == "CR") >
max_cond(var = ADY.join, cond = AVALC.join == "PR") |
count_vals(var = AVALC.join, val = "CR") == 0
)
)
#> # A tibble: 14 × 4
#> USUBJID ADY AVALC CONFFL
#> <chr> <dbl> <chr> <chr>
#> 1 1 6 PR NA
#> 2 1 12 CR NA
#> 3 1 24 NE NA
#> 4 1 32 CR NA
#> 5 1 48 PR NA
#> 6 2 3 PR NA
#> 7 2 21 CR NA
#> 8 2 33 PR NA
#> 9 3 11 PR NA
#> 10 4 7 PR NA
#> 11 4 12 NE NA
#> 12 4 24 NE NA
#> 13 4 32 PR Y
#> 14 4 55 PR NA
# flag observations with CRIT1FL == "Y" at two consecutive visits or at the last visit
data <- tribble(
~USUBJID, ~AVISITN, ~CRIT1FL,
"1", 1, "Y",
"1", 2, "N",
"1", 3, "Y",
"1", 5, "N",
"2", 1, "Y",
"2", 3, "Y",
"2", 5, "N",
"3", 1, "Y",
"4", 1, "Y",
"4", 2, "N",
)
derive_var_joined_exist_flag(
data,
dataset_add = data,
by_vars = exprs(USUBJID),
new_var = CONFFL,
tmp_obs_nr_var = tmp_obs_nr,
join_vars = exprs(CRIT1FL),
join_type = "all",
order = exprs(AVISITN),
filter_join = CRIT1FL == "Y" & CRIT1FL.join == "Y" &
(tmp_obs_nr + 1 == tmp_obs_nr.join | tmp_obs_nr == max(tmp_obs_nr.join))
)
#> # A tibble: 10 × 4
#> USUBJID AVISITN CRIT1FL CONFFL
#> <chr> <dbl> <chr> <chr>
#> 1 1 1 Y NA
#> 2 1 2 N NA
#> 3 1 3 Y NA
#> 4 1 5 N NA
#> 5 2 1 Y Y
#> 6 2 3 Y NA
#> 7 2 5 N NA
#> 8 3 1 Y Y
#> 9 4 1 Y NA
#> 10 4 2 N NA
# first_cond_lower and first_cond_upper argument
myd <- tribble(
~subj, ~day, ~val,
"1", 1, "++",
"1", 2, "-",
"1", 3, "0",
"1", 4, "+",
"1", 5, "++",
"1", 6, "-",
"2", 1, "-",
"2", 2, "++",
"2", 3, "+",
"2", 4, "0",
"2", 5, "-",
"2", 6, "++"
)
# flag "0" where all results from the first "++" before the "0" up to the "0"
# (excluding the "0") are "+" or "++"
derive_var_joined_exist_flag(
myd,
dataset_add = myd,
by_vars = exprs(subj),
order = exprs(day),
new_var = flag,
join_vars = exprs(val),
join_type = "before",
first_cond_lower = val.join == "++",
filter_join = val == "0" & all(val.join %in% c("+", "++"))
)
#> # A tibble: 12 × 4
#> subj day val flag
#> <chr> <dbl> <chr> <chr>
#> 1 1 1 ++ NA
#> 2 1 2 - NA
#> 3 1 3 0 NA
#> 4 1 4 + NA
#> 5 1 5 ++ NA
#> 6 1 6 - NA
#> 7 2 1 - NA
#> 8 2 2 ++ NA
#> 9 2 3 + NA
#> 10 2 4 0 Y
#> 11 2 5 - NA
#> 12 2 6 ++ NA
# flag "0" where all results from the "0" (excluding the "0") up to the first
# "++" after the "0" are "+" or "++"
derive_var_joined_exist_flag(
myd,
dataset_add = myd,
by_vars = exprs(subj),
order = exprs(day),
new_var = flag,
join_vars = exprs(val),
join_type = "after",
first_cond_upper = val.join == "++",
filter_join = val == "0" & all(val.join %in% c("+", "++"))
)
#> # A tibble: 12 × 4
#> subj day val flag
#> <chr> <dbl> <chr> <chr>
#> 1 1 1 ++ NA
#> 2 1 2 - NA
#> 3 1 3 0 Y
#> 4 1 4 + NA
#> 5 1 5 ++ NA
#> 6 1 6 - NA
#> 7 2 1 - NA
#> 8 2 2 ++ NA
#> 9 2 3 + NA
#> 10 2 4 0 NA
#> 11 2 5 - NA
#> 12 2 6 ++ NA
# flag each dose which is lower than the previous dose per subject
ex <- tribble(
~USUBJID, ~EXSTDTM, ~EXDOSE,
"1", "2024-01-01T08:00", 2,
"1", "2024-01-02T08:00", 4,
"2", "2024-01-01T08:30", 1,
"2", "2024-01-02T08:30", 4,
"2", "2024-01-03T08:30", 3,
"2", "2024-01-04T08:30", 2,
"2", "2024-01-05T08:30", 2
)
derive_var_joined_exist_flag(
ex,
dataset_add = ex,
by_vars = exprs(USUBJID),
order = exprs(EXSTDTM),
new_var = DOSREDFL,
tmp_obs_nr_var = tmp_dose_nr,
join_vars = exprs(EXDOSE),
join_type = "before",
filter_join = (
tmp_dose_nr == tmp_dose_nr.join + 1 # Look only at adjacent doses
& EXDOSE > 0 & EXDOSE.join > 0 # Both doses are valid
& EXDOSE < EXDOSE.join # Dose is lower than previous
)
)
#> # A tibble: 7 × 4
#> USUBJID EXSTDTM EXDOSE DOSREDFL
#> <chr> <chr> <dbl> <chr>
#> 1 1 2024-01-01T08:00 2 NA
#> 2 1 2024-01-02T08:00 4 NA
#> 3 2 2024-01-01T08:30 1 NA
#> 4 2 2024-01-02T08:30 4 NA
#> 5 2 2024-01-03T08:30 3 Y
#> 6 2 2024-01-04T08:30 2 Y
#> 7 2 2024-01-05T08:30 2 NA
# derive definitive deterioration flag as any deterioration (CHGCAT1 = "Worsened")
# by parameter that is not followed by a non-deterioration
adqs <- tribble(
~USUBJID, ~PARAMCD, ~ADY, ~CHGCAT1,
"1", "QS1", 10, "Improved",
"1", "QS1", 21, "Improved",
"1", "QS1", 23, "Improved",
"1", "QS2", 32, "Worsened",
"1", "QS2", 42, "Improved",
"2", "QS1", 11, "Worsened",
"2", "QS1", 24, "Worsened"
)
derive_var_joined_exist_flag(
adqs,
dataset_add = adqs,
new_var = DDETERFL,
by_vars = exprs(USUBJID, PARAMCD),
join_vars = exprs(CHGCAT1),
join_type = "all",
order = exprs(ADY),
filter_join = all(CHGCAT1.join == "Worsened" | ADY > ADY.join)
)
#> # A tibble: 7 × 5
#> USUBJID PARAMCD ADY CHGCAT1 DDETERFL
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 1 QS1 10 Improved NA
#> 2 1 QS1 21 Improved NA
#> 3 1 QS1 23 Improved NA
#> 4 1 QS2 32 Worsened NA
#> 5 1 QS2 42 Improved NA
#> 6 2 QS1 11 Worsened Y
#> 7 2 QS1 24 Worsened Y