Merge an Existence Flag From Multiple Sources

Adds a flag variable to the input dataset which indicates if there exists at least one observation in one of the source datasets fulfilling a certain condition. For example, if a dose adjustment flag should be added to ADEX but the dose adjustment information is collected in different datasets, e.g., EX, EC, and FA.

Usage

derive_var_merged_ef_msrc(
  dataset,
  by_vars,
  flag_events,
  source_datasets,
  new_var,
  true_value = "Y",
  false_value = NA_character_,
  missing_value = NA_character_
)

Arguments

dataset

Input dataset

The variables specified by the by_vars argument are expected to be in the dataset.

Permitted values: a dataset, i.e., a data.frame or tibble
Default value: none

by_vars

Grouping variables

Permitted values: list of variables created by exprs(), e.g., exprs(USUBJID, VISIT)
Default value: none

flag_events

Flag events

A list of flag_event() objects is expected. For each event the condition (condition field) is evaluated in the source dataset referenced by the dataset_name field. If it evaluates to TRUE at least once, the new variable is set to true_value.

Permitted values: a list of flag_event() objects
Default value: none

source_datasets

Source datasets

A named list of datasets is expected. The dataset_name field of flag_event() refers to the dataset provided in the list.

Permitted values: named list of datasets, e.g., list(adsl = adsl, ae = ae)
Default value: none

new_var

New variable

The specified variable is added to the input dataset.

Permitted values: an unquoted symbol, e.g., AVAL
Default value: none

true_value

True value

The new variable (new_var) is set to the specified value for all by groups for which at least one of the source object (sources) has the condition evaluate to TRUE.

The values of true_value, false_value, and missing_value must be of the same type.

Permitted values: a character scalar, i.e., a character vector of length one
Default value: "Y"

false_value

False value

The new variable (new_var) is set to the specified value for all by groups which occur in at least one source (sources) but the condition never evaluates to TRUE.

The values of true_value, false_value, and missing_value must be of the same type.

Permitted values: a character scalar, i.e., a character vector of length one
Default value: NA_character_

missing_value

Values used for missing information

The new variable is set to the specified value for all by groups without observations in any of the sources (sources).

The values of true_value, false_value, and missing_value must be of the same type.

Permitted values: a character scalar, i.e., a character vector of length one
Default value: NA_character_

Value

The output dataset contains all observations and variables of the input dataset and additionally the variable specified for new_var.

Details

For each flag_event() object specified for flag_events: The condition (condition) is evaluated in the dataset referenced by dataset_name. If the by_vars field is specified the dataset is grouped by the specified variables for evaluating the condition. If named elements are used in by_vars like by_vars = exprs(USUBJID, EXLNKID = ECLNKID), the variables are renamed after the evaluation. If the by_vars element is not specified, the observations are grouped by the variables specified for the by_vars argument.
The new variable (new_var) is added to the input dataset and set to the true value (true_value) if for the by group at least one condition evaluates to TRUE in one of the sources. It is set to the false value (false_value) if for the by group at least one observation exists and for all observations the condition evaluates to FALSE or NA. Otherwise, it is set to the missing value (missing_value).

Examples

Data setup

The following examples use the datasets below. adsl is the subject- level dataset onto which the flag is merged. cm contains concomitant medication records and pr contains procedure records — both are used as sources in the examples.

library(dplyr)

adsl <- tribble(
  ~USUBJID,
  "1",
  "2",
  "3",
  "4",
  "5"
)

cm <- tribble(
  ~USUBJID, ~CMCAT,        ~CMSEQ,
  "1",      "ANTI-CANCER",      1,
  "1",      "GENERAL",          2,
  "2",      "GENERAL",          1,
  "3",      "ANTI-CANCER",      1,
  "5",      "GENERAL",          1
)

# All records in PR are assumed to indicate cancer treatment
pr <- tribble(
  ~USUBJID, ~PRSEQ,
  "2",      1,
  "3",      1
)

Flagging from multiple sources (`flag_events`)

The flag_events argument takes a list of flag_event() objects, each pointing to a named source dataset and an optional condition. For a given by group, the new variable is set to true_value if the condition evaluates to TRUE at least once in any of the sources.

In the example below, an anti-cancer treatment flag CANCTRFL is derived from two sources:

cm: flagged when CMCAT == "ANTI-CANCER"
pr: all records qualify (no condition specified), so any subject with a procedure record is flagged

With the default false_value = NA_character_ and missing_value = NA_character_, both subjects "4" and "5" receive NA — but for different reasons: subject "5" is present in cm but has no anti-cancer record (false_value), while subject "4" is absent from all sources (missing_value). See the next section to learn how to distinguish these two cases by setting false_value and missing_value to different values.

derive_var_merged_ef_msrc(
  adsl,
  by_vars = exprs(USUBJID),
  flag_events = list(
    flag_event(
      dataset_name = "cm",
      condition = CMCAT == "ANTI-CANCER"
    ),
    flag_event(
      dataset_name = "pr"
    )
  ),
  source_datasets = list(cm = cm, pr = pr),
  new_var = CANCTRFL
)
#> # A tibble: 5 × 2
#>   USUBJID CANCTRFL
#>   <chr>   <chr>
#> 1 1       Y
#> 2 2       Y
#> 3 3       Y
#> 4 4       <NA>
#> 5 5       <NA>

Controlling flag values (`true_value`, `false_value`, `missing_value`)

By default true_value = "Y", false_value = NA_character_, and missing_value = NA_character_. Setting them explicitly lets you distinguish three subject-level states:

true_value: subject has at least one qualifying record in any source
false_value: subject appears in at least one source, but no record meets the condition
missing_value: subject has no records in any source

In the example below, a subject-level ADSL dataset is used together with dose adjustment sources (adex, ec, fa). This reveals all three cases in the output:

Subjects "1" and "3": dose adjustment found → "Y" via true_value
Subject "2": present in adex but no adjustment found → "N" via false_value
Subject "4": absent from all sources → NA via missing_value

adsl_ex <- tribble(
  ~USUBJID,
  "1",
  "2",
  "3",
  "4"
)

adex <- tribble(
  ~USUBJID, ~EXADJ,
  "1",      "DOSE REDUCED",
  "2",      NA_character_
)

ec <- tribble(
  ~USUBJID, ~ECADJ,
  "3",      "DOSE REDUCED"
)

fa <- tribble(
  ~USUBJID, ~FATESTCD, ~FAOBJ,            ~FASTRESC,
  "1",      "OCCUR",   "DOSE ADJUSTMENT", "Y"
)

derive_var_merged_ef_msrc(
  adsl_ex,
  by_vars = exprs(USUBJID),
  flag_events = list(
    flag_event(
      dataset_name = "ex",
      condition = !is.na(EXADJ)
    ),
    flag_event(
      dataset_name = "ec",
      condition = !is.na(ECADJ)
    ),
    flag_event(
      dataset_name = "fa",
      condition = FATESTCD == "OCCUR" & FAOBJ == "DOSE ADJUSTMENT" & FASTRESC == "Y"
    )
  ),
  source_datasets = list(ex = adex, ec = ec, fa = fa),
  new_var = DOSADJFL,
  true_value = "Y",
  false_value = "N",
  missing_value = NA_character_
)
#> # A tibble: 4 × 2
#>   USUBJID DOSADJFL
#>   <chr>   <chr>
#> 1 1       Y
#> 2 2       N
#> 3 3       Y
#> 4 4       <NA>

Per-source `by_vars` renaming

When the grouping variable has a different name in a source dataset, the by_vars argument of flag_event() can be used to rename it using the exprs(<target> = <source>) syntax. This allows each source to use its own link variable while still merging correctly onto the input dataset.

In the example below, a dose adjustment flag DOSADJFL is derived for each exposure record in adex. The flag is set to "Y" if a dose adjustment is recorded in any of three sources:

ex: directly via EXADJ
ec: linked via ECLNKID (renamed to EXLNKID for the merge)
fa: linked via FALNKID (renamed to EXLNKID for the merge)

adex <- tribble(
  ~USUBJID, ~EXLNKID, ~EXADJ,
  "1",      "1",      "AE",
  "1",      "2",      NA_character_,
  "1",      "3",      NA_character_,
  "2",      "1",      NA_character_,
  "3",      "1",      NA_character_
)

ec <- tribble(
  ~USUBJID, ~ECLNKID, ~ECADJ,
  "1",      "3",      "AE",
  "3",      "1",      NA_character_
)

fa <- tribble(
  ~USUBJID, ~FALNKID, ~FATESTCD, ~FAOBJ,            ~FASTRESC,
  "3",      "1",      "OCCUR",   "DOSE ADJUSTMENT", "Y"
)

derive_var_merged_ef_msrc(
  adex,
  by_vars = exprs(USUBJID, EXLNKID),
  flag_events = list(
    flag_event(
      dataset_name = "ex",
      condition = !is.na(EXADJ)
    ),
    flag_event(
      dataset_name = "ec",
      condition = !is.na(ECADJ),
      by_vars = exprs(USUBJID, EXLNKID = ECLNKID)
    ),
    flag_event(
      dataset_name = "fa",
      condition = FATESTCD == "OCCUR" & FAOBJ == "DOSE ADJUSTMENT" & FASTRESC == "Y",
      by_vars = exprs(USUBJID, EXLNKID = FALNKID)
    )
  ),
  source_datasets = list(ex = adex, ec = ec, fa = fa),
  new_var = DOSADJFL
)
#> # A tibble: 5 × 4
#>   USUBJID EXLNKID EXADJ DOSADJFL
#>   <chr>   <chr>   <chr> <chr>
#> 1 1       1       AE    Y
#> 2 1       2       <NA>  <NA>
#> 3 1       3       <NA>  Y
#> 4 2       1       <NA>  <NA>
#> 5 3       1       <NA>  Y

Usage

Arguments

Value

Details

See also

Examples

Data setup

Flagging from multiple sources (flag_events)

Controlling flag values (true_value, false_value, missing_value)

Per-source by_vars renaming

Flagging from multiple sources (`flag_events`)

Controlling flag values (`true_value`, `false_value`, `missing_value`)

Per-source `by_vars` renaming