Add New Variable(s) to the Input Dataset Based on Variables from Another Dataset
Source:R/derive_merged.R
derive_vars_merged.Rd
Add new variable(s) to the input dataset based on variables from another
dataset. The observations to merge can be selected by a condition
(filter_add
argument) and/or selecting the first or last observation for
each by group (order
and mode
argument).
Usage
derive_vars_merged(
dataset,
dataset_add,
by_vars,
order = NULL,
new_vars = NULL,
filter_add = NULL,
mode = NULL,
exist_flag = NULL,
true_value = "Y",
false_value = NA_character_,
missing_values = NULL,
check_type = "warning",
duplicate_msg = NULL,
relationship = NULL
)
Arguments
- dataset
Input dataset
The variables specified by the
by_vars
argument are expected to be in the dataset.- dataset_add
Additional dataset
The variables specified by the
by_vars
, thenew_vars
, and theorder
argument are expected.- by_vars
Grouping variables
The input dataset and the selected observations from the additional dataset are merged by the specified variables.
Variables can be renamed by naming the element, i.e.
by_vars = exprs(<name in input dataset> = <name in additional dataset>)
, similar to thedplyr
joins.Permitted Values: list of variables created by
exprs()
e.g.exprs(USUBJID, VISIT)
- order
Sort order
If the argument is set to a non-null value, for each by group the first or last observation from the additional dataset is selected with respect to the specified order.
Variables defined by the
new_vars
argument can be used in the sort order.For handling of
NA
s in sorting variables see Sort Order.Permitted Values: list of expressions created by
exprs()
, e.g.,exprs(ADT, desc(AVAL))
orNULL
- new_vars
Variables to add
The specified variables from the additional dataset are added to the output dataset. Variables can be renamed by naming the element, i.e.,
new_vars = exprs(<new name> = <old name>)
.For example
new_vars = exprs(var1, var2)
adds variablesvar1
andvar2
fromdataset_add
to the input dataset.And
new_vars = exprs(var1, new_var2 = old_var2)
takesvar1
andold_var2
fromdataset_add
and adds them to the input dataset renamingold_var2
tonew_var2
.Values of the added variables can be modified by specifying an expression. For example,
new_vars = LASTRSP = exprs(str_to_upper(AVALC))
adds the variableLASTRSP
to the dataset and sets it to the upper case value ofAVALC
.If the argument is not specified or set to
NULL
, all variables from the additional dataset (dataset_add
) are added.Permitted Values: list of variables or named expressions created by
exprs()
- filter_add
Filter for additional dataset (
dataset_add
)Only observations fulfilling the specified condition are taken into account for merging. If the argument is not specified, all observations are considered.
Variables defined by the
new_vars
argument can be used in the filter condition.Permitted Values: a condition
- mode
Selection mode
Determines if the first or last observation is selected. If the
order
argument is specified,mode
must be non-null.If the
order
argument is not specified, themode
argument is ignored.Permitted Values:
"first"
,"last"
,NULL
- exist_flag
Exist flag
If the argument is specified (e.g.,
exist_flag = FLAG
), the specified variable (e.g.,FLAG
) is added to the input dataset. This variable will be the value provided intrue_value
for all selected records fromdataset_add
which are merged into the input dataset, and the value provided infalse_value
otherwise.Permitted Values: Variable name
- true_value
True value
The value for the specified variable
exist_flag
, applicable to the first or last observation (depending on the mode) of each by group.Permitted Values: An atomic scalar
- false_value
False value
The value for the specified variable
exist_flag
, NOT applicable to the first or last observation (depending on the mode) of each by group.Permitted Values: An atomic scalar
- missing_values
Values for non-matching observations
For observations of the input dataset (
dataset
) which do not have a matching observation in the additional dataset (dataset_add
) the values of the specified variables are set to the specified value. Only variables specified fornew_vars
can be specified formissing_values
.Permitted Values: named list of expressions, e.g.,
exprs(BASEC = "MISSING", BASE = -1)
- check_type
Check uniqueness?
If
"warning"
,"message"
, or"error"
is specified, the specified message is issued if the observations of the (restricted) additional dataset are not unique with respect to the by variables and the order.If the
order
argument is not specified, thecheck_type
argument is ignored: if the observations of the (restricted) additional dataset are not unique with respect to the by variables, an error is issued.Permitted Values:
"none"
,"message"
,"warning"
,"error"
- duplicate_msg
Message of unique check
If the uniqueness check fails, the specified message is displayed.
Default:
paste( "Dataset {.arg dataset_add} contains duplicate records with respect to", "{.var {vars2chr(by_vars)}}." )
- relationship
Expected merge-relationship between the
by_vars
variable(s) indataset
(input dataset) and thedataset_add
(additional dataset) containing the additionalnew_vars
.This argument is passed to the
dplyr::left_join()
function. See https://dplyr.tidyverse.org/reference/mutate-joins.html#arguments for more details.Permitted Values:
"one-to-one"
,"many-to-one"
,NULL
.
Value
The output dataset contains all observations and variables of the
input dataset and additionally the variables specified for new_vars
from
the additional dataset (dataset_add
).
Details
The new variables (
new_vars
) are added to the additional dataset (dataset_add
).The records from the additional dataset (
dataset_add
) are restricted to those matching thefilter_add
condition.If
order
is specified, for each by group the first or last observation (depending onmode
) is selected.The variables specified for
new_vars
are merged to the input dataset usingleft_join()
. I.e., the output dataset contains all observations from the input dataset. For observations without a matching observation in the additional dataset the new variables are set as specified bymissing_values
(or toNA
for variables not inmissing_values
). Observations in the additional dataset which have no matching observation in the input dataset are ignored.
See also
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_merged_ef_msrc()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_cat()
,
derive_vars_computed()
,
derive_vars_joined()
,
derive_vars_merged_lookup()
,
derive_vars_transposed()
Examples
library(dplyr, warn.conflicts = FALSE)
vs <- tribble(
~STUDYID, ~DOMAIN, ~USUBJID, ~VSTESTCD, ~VISIT, ~VSSTRESN, ~VSSTRESU, ~VSDTC,
"PILOT01", "VS", "01-1302", "HEIGHT", "SCREENING", 177.8, "cm", "2013-08-20",
"PILOT01", "VS", "01-1302", "WEIGHT", "SCREENING", 81.19, "kg", "2013-08-20",
"PILOT01", "VS", "01-1302", "WEIGHT", "BASELINE", 82.1, "kg", "2013-08-29",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 2", 81.19, "kg", "2013-09-15",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 4", 82.56, "kg", "2013-09-24",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 6", 80.74, "kg", "2013-10-08",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 8", 82.1, "kg", "2013-10-22",
"PILOT01", "VS", "01-1302", "WEIGHT", "WEEK 12", 82.1, "kg", "2013-11-05",
"PILOT01", "VS", "17-1344", "HEIGHT", "SCREENING", 163.5, "cm", "2014-01-01",
"PILOT01", "VS", "17-1344", "WEIGHT", "SCREENING", 58.06, "kg", "2014-01-01",
"PILOT01", "VS", "17-1344", "WEIGHT", "BASELINE", 58.06, "kg", "2014-01-11",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 2", 58.97, "kg", "2014-01-24",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 4", 57.97, "kg", "2014-02-07",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 6", 58.97, "kg", "2014-02-19",
"PILOT01", "VS", "17-1344", "WEIGHT", "WEEK 8", 57.79, "kg", "2014-03-14"
)
dm <- tribble(
~STUDYID, ~DOMAIN, ~USUBJID, ~AGE, ~AGEU,
"PILOT01", "DM", "01-1302", 61, "YEARS",
"PILOT01", "DM", "17-1344", 64, "YEARS"
)
# Merging all dm variables to vs
derive_vars_merged(
vs,
dataset_add = select(dm, -DOMAIN),
by_vars = exprs(STUDYID, USUBJID)
) %>%
select(STUDYID, USUBJID, VSTESTCD, VISIT, VSSTRESN, AGE, AGEU)
#> # A tibble: 15 × 7
#> STUDYID USUBJID VSTESTCD VISIT VSSTRESN AGE AGEU
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 PILOT01 01-1302 HEIGHT SCREENING 178. 61 YEARS
#> 2 PILOT01 01-1302 WEIGHT SCREENING 81.2 61 YEARS
#> 3 PILOT01 01-1302 WEIGHT BASELINE 82.1 61 YEARS
#> 4 PILOT01 01-1302 WEIGHT WEEK 2 81.2 61 YEARS
#> 5 PILOT01 01-1302 WEIGHT WEEK 4 82.6 61 YEARS
#> 6 PILOT01 01-1302 WEIGHT WEEK 6 80.7 61 YEARS
#> 7 PILOT01 01-1302 WEIGHT WEEK 8 82.1 61 YEARS
#> 8 PILOT01 01-1302 WEIGHT WEEK 12 82.1 61 YEARS
#> 9 PILOT01 17-1344 HEIGHT SCREENING 164. 64 YEARS
#> 10 PILOT01 17-1344 WEIGHT SCREENING 58.1 64 YEARS
#> 11 PILOT01 17-1344 WEIGHT BASELINE 58.1 64 YEARS
#> 12 PILOT01 17-1344 WEIGHT WEEK 2 59.0 64 YEARS
#> 13 PILOT01 17-1344 WEIGHT WEEK 4 58.0 64 YEARS
#> 14 PILOT01 17-1344 WEIGHT WEEK 6 59.0 64 YEARS
#> 15 PILOT01 17-1344 WEIGHT WEEK 8 57.8 64 YEARS
# Merge last weight to adsl
adsl <- tribble(
~STUDYID, ~USUBJID, ~AGE, ~AGEU,
"PILOT01", "01-1302", 61, "YEARS",
"PILOT01", "17-1344", 64, "YEARS"
)
derive_vars_merged(
adsl,
dataset_add = vs,
by_vars = exprs(STUDYID, USUBJID),
order = exprs(convert_dtc_to_dtm(VSDTC)),
mode = "last",
new_vars = exprs(LASTWGT = VSSTRESN, LASTWGTU = VSSTRESU),
filter_add = VSTESTCD == "WEIGHT",
exist_flag = vsdatafl
) %>%
select(STUDYID, USUBJID, AGE, AGEU, LASTWGT, LASTWGTU, vsdatafl)
#> # A tibble: 2 × 7
#> STUDYID USUBJID AGE AGEU LASTWGT LASTWGTU vsdatafl
#> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
#> 1 PILOT01 01-1302 61 YEARS 82.1 kg Y
#> 2 PILOT01 17-1344 64 YEARS 57.8 kg Y
# Derive treatment start datetime (TRTSDTM)
ex <- tribble(
~STUDYID, ~DOMAIN, ~USUBJID, ~EXSTDY, ~EXENDY, ~EXSTDTC, ~EXENDTC,
"PILOT01", "EX", "01-1302", 1, 18, "2013-08-29", "2013-09-15",
"PILOT01", "EX", "01-1302", 19, 69, "2013-09-16", "2013-11-05",
"PILOT01", "EX", "17-1344", 1, 14, "2014-01-11", "2014-01-24",
"PILOT01", "EX", "17-1344", 15, 63, "2014-01-25", "2014-03-14"
)
## Impute exposure start date to first date/time
ex_ext <- derive_vars_dtm(
ex,
dtc = EXSTDTC,
new_vars_prefix = "EXST",
highest_imputation = "M",
)
## Add first exposure datetime and imputation flags to adsl
derive_vars_merged(
select(dm, STUDYID, USUBJID),
dataset_add = ex_ext,
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTSDTM = EXSTDTM, TRTSDTF = EXSTDTF, TRTSTMF = EXSTTMF),
order = exprs(EXSTDTM),
mode = "first"
)
#> # A tibble: 2 × 5
#> STUDYID USUBJID TRTSDTM TRTSDTF TRTSTMF
#> <chr> <chr> <dttm> <chr> <chr>
#> 1 PILOT01 01-1302 2013-08-29 00:00:00 NA H
#> 2 PILOT01 17-1344 2014-01-11 00:00:00 NA H
# Derive treatment end datetime (TRTEDTM)
## Impute exposure end datetime to last time, no date imputation
ex_ext <- derive_vars_dtm(
ex,
dtc = EXENDTC,
new_vars_prefix = "EXEN",
time_imputation = "last",
)
## Add last exposure datetime and imputation flag to adsl
derive_vars_merged(
select(adsl, STUDYID, USUBJID),
dataset_add = ex_ext,
filter_add = !is.na(EXENDTM),
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
order = exprs(EXENDTM),
mode = "last"
)
#> # A tibble: 2 × 4
#> STUDYID USUBJID TRTEDTM TRTETMF
#> <chr> <chr> <dttm> <chr>
#> 1 PILOT01 01-1302 2013-11-05 23:59:59 H
#> 2 PILOT01 17-1344 2014-03-14 23:59:59 H
# Modify merged values and set value for non matching observations
adsl <- tribble(
~USUBJID, ~SEX, ~COUNTRY,
"ST42-1", "F", "AUT",
"ST42-2", "M", "MWI",
"ST42-3", "M", "NOR",
"ST42-4", "F", "UGA"
)
advs <- tribble(
~USUBJID, ~PARAMCD, ~AVISIT, ~AVISITN, ~AVAL,
"ST42-1", "WEIGHT", "BASELINE", 0, 66,
"ST42-1", "WEIGHT", "WEEK 2", 1, 68,
"ST42-2", "WEIGHT", "BASELINE", 0, 88,
"ST42-3", "WEIGHT", "WEEK 2", 1, 55,
"ST42-3", "WEIGHT", "WEEK 4", 2, 50
)
derive_vars_merged(
adsl,
dataset_add = advs,
by_vars = exprs(USUBJID),
new_vars = exprs(
LSTVSCAT = if_else(AVISIT == "BASELINE", "BASELINE", "POST-BASELINE")
),
order = exprs(AVISITN),
mode = "last",
missing_values = exprs(LSTVSCAT = "MISSING")
)
#> # A tibble: 4 × 4
#> USUBJID SEX COUNTRY LSTVSCAT
#> <chr> <chr> <chr> <chr>
#> 1 ST42-1 F AUT POST-BASELINE
#> 2 ST42-2 M MWI BASELINE
#> 3 ST42-3 M NOR POST-BASELINE
#> 4 ST42-4 F UGA MISSING