Add New Records Within By Groups Using Aggregation Functions
Source:R/derive_summary_records.R
derive_summary_records.Rd
It is not uncommon to have an analysis need whereby one needs to derive an
analysis value (AVAL
) from multiple records. The ADaM basic dataset
structure variable DTYPE
is available to indicate when a new derived
records has been added to a dataset.
Usage
derive_summary_records(
dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars,
filter_add = NULL,
set_values_to,
missing_values = NULL
)
Arguments
- dataset
Input dataset
The variables specified by the
by_vars
argument are expected to be in the dataset.- dataset_add
Additional dataset
The variables specified for
by_vars
are expected. Observations from the specified dataset are going to be used to calculate and added as new records to the input dataset (dataset
).- dataset_ref
Reference dataset
The variables specified for
by_vars
are expected. For each observation of the specified dataset a new observation is added to the input dataset.- by_vars
Grouping variables
Variables to consider for generation of groupwise summary records. Providing the names of variables in
exprs()
will create a groupwise summary and generate summary records for the specified groups.Permitted Values: list of variables created by
exprs()
e.g.exprs(USUBJID, VISIT)
- filter_add
Filter condition as logical expression to apply during summary calculation. By default, filtering expressions are computed within
by_vars
as this will help when an aggregating, lagging, or ranking function is involved.For example,
filter_add = (AVAL > mean(AVAL, na.rm = TRUE))
will filter allAVAL
values greater than mean ofAVAL
with inby_vars
.filter_add = (dplyr::n() > 2)
will filter n count ofby_vars
greater than 2.
- set_values_to
Variables to be set
The specified variables are set to the specified values for the new observations.
Set a list of variables to some specified value for the new records
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA. If summary functions are used, the values are summarized by the variables specified for
by_vars
.
For example:
- missing_values
Values for missing summary values
For observations of the reference dataset (
dataset_ref
) which do not have a complete mapping defined by the summarization defined inset_values_to
. Only variables specified forset_values_to
can be specified formissing_values
.Permitted Values: named list of expressions, e.g.,
exprs(AVAL = -9999)
Details
When all records have same values within by_vars
then this function will
retain those common values in the newly derived records. Otherwise new value
will be set to NA
.
See also
get_summary_records()
, derive_var_merged_summary()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_event()
,
derive_extreme_records()
,
derive_locf_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()
Examples
library(tibble)
library(dplyr)
adeg <- tribble(
~USUBJID, ~EGSEQ, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL, ~TRTA,
"XYZ-1001", 1, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385, NA_character_,
"XYZ-1001", 2, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399, NA_character_,
"XYZ-1001", 3, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396, NA_character_,
"XYZ-1001", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:45", 384, "Placebo",
"XYZ-1001", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393, "Placebo",
"XYZ-1001", 6, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388, "Placebo",
"XYZ-1001", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:45", 385, "Placebo",
"XYZ-1001", 8, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394, "Placebo",
"XYZ-1001", 9, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402, "Placebo",
"XYZ-1002", 1, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399, NA_character_,
"XYZ-1002", 2, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 410, NA_character_,
"XYZ-1002", 3, "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392, NA_character_,
"XYZ-1002", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:50", 401, "Active 20mg",
"XYZ-1002", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:53", 407, "Active 20mg",
"XYZ-1002", 6, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:56", 400, "Active 20mg",
"XYZ-1002", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:50", 412, "Active 20mg",
"XYZ-1002", 8, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414, "Active 20mg",
"XYZ-1002", 9, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402, "Active 20mg"
) %>%
mutate(
ADTM = convert_dtc_to_dtm(EGDTC)
)
# Summarize the average of the triplicate ECG interval values (AVAL)
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 24 × 9
#> USUBJID EGSEQ PARAM AVISIT EGDTC AVAL TRTA ADTM DTYPE
#> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dttm> <chr>
#> 1 XYZ-1001 1 QTcF Int. … Basel… 2016… 385 NA 2016-02-24 07:50:00 NA
#> 2 XYZ-1001 2 QTcF Int. … Basel… 2016… 399 NA 2016-02-24 07:52:00 NA
#> 3 XYZ-1001 3 QTcF Int. … Basel… 2016… 396 NA 2016-02-24 07:56:00 NA
#> 4 XYZ-1001 NA QTcF Int. … Basel… NA 393. NA NA AVER…
#> 5 XYZ-1001 4 QTcF Int. … Visit… 2016… 384 Plac… 2016-03-08 09:45:00 NA
#> 6 XYZ-1001 5 QTcF Int. … Visit… 2016… 393 Plac… 2016-03-08 09:48:00 NA
#> 7 XYZ-1001 6 QTcF Int. … Visit… 2016… 388 Plac… 2016-03-08 09:51:00 NA
#> 8 XYZ-1001 NA QTcF Int. … Visit… NA 388. NA NA AVER…
#> 9 XYZ-1001 7 QTcF Int. … Visit… 2016… 385 Plac… 2016-03-22 10:45:00 NA
#> 10 XYZ-1001 8 QTcF Int. … Visit… 2016… 394 Plac… 2016-03-22 10:48:00 NA
#> # ℹ 14 more rows
# Derive more than one summary variable
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
set_values_to = exprs(
AVAL = mean(AVAL),
ADTM = max(ADTM),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT) %>%
select(-EGSEQ, -TRTA)
#> # A tibble: 24 × 7
#> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE
#> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 NA
#> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 NA
#> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 NA
#> 4 XYZ-1001 QTcF Int. (msec) Baseline NA 393. 2016-02-24 07:56:00 AVER…
#> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 384 2016-03-08 09:45:00 NA
#> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 NA
#> 7 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 NA
#> 8 XYZ-1001 QTcF Int. (msec) Visit 2 NA 388. 2016-03-08 09:51:00 AVER…
#> 9 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 385 2016-03-22 10:45:00 NA
#> 10 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 NA
#> # ℹ 14 more rows
# Sample ADEG dataset with triplicate record for only AVISIT = 'Baseline'
adeg <- tribble(
~USUBJID, ~EGSEQ, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL, ~TRTA,
"XYZ-1001", 1, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385, NA_character_,
"XYZ-1001", 2, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399, NA_character_,
"XYZ-1001", 3, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396, NA_character_,
"XYZ-1001", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393, "Placebo",
"XYZ-1001", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388, "Placebo",
"XYZ-1001", 6, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394, "Placebo",
"XYZ-1001", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402, "Placebo",
"XYZ-1002", 1, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399, NA_character_,
"XYZ-1002", 2, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 410, NA_character_,
"XYZ-1002", 3, "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392, NA_character_,
"XYZ-1002", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:53", 407, "Active 20mg",
"XYZ-1002", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:56", 400, "Active 20mg",
"XYZ-1002", 6, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414, "Active 20mg",
"XYZ-1002", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402, "Active 20mg"
)
# Compute the average of AVAL only if there are more than 2 records within the
# by group
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
filter_add = n() > 2,
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 16 × 8
#> USUBJID EGSEQ PARAM AVISIT EGDTC AVAL TRTA DTYPE
#> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 XYZ-1001 1 QTcF Int. (msec) Baseline 2016-02-24T07:50 385 NA NA
#> 2 XYZ-1001 2 QTcF Int. (msec) Baseline 2016-02-24T07:52 399 NA NA
#> 3 XYZ-1001 3 QTcF Int. (msec) Baseline 2016-02-24T07:56 396 NA NA
#> 4 XYZ-1001 NA QTcF Int. (msec) Baseline NA 393. NA AVER…
#> 5 XYZ-1001 4 QTcF Int. (msec) Visit 2 2016-03-08T09:48 393 Placebo NA
#> 6 XYZ-1001 5 QTcF Int. (msec) Visit 2 2016-03-08T09:51 388 Placebo NA
#> 7 XYZ-1001 6 QTcF Int. (msec) Visit 3 2016-03-22T10:48 394 Placebo NA
#> 8 XYZ-1001 7 QTcF Int. (msec) Visit 3 2016-03-22T10:51 402 Placebo NA
#> 9 XYZ-1002 1 QTcF Int. (msec) Baseline 2016-02-22T07:58 399 NA NA
#> 10 XYZ-1002 2 QTcF Int. (msec) Baseline 2016-02-22T07:58 410 NA NA
#> 11 XYZ-1002 3 QTcF Int. (msec) Baseline 2016-02-22T08:01 392 NA NA
#> 12 XYZ-1002 NA QTcF Int. (msec) Baseline NA 400. NA AVER…
#> 13 XYZ-1002 4 QTcF Int. (msec) Visit 2 2016-03-06T09:53 407 Active… NA
#> 14 XYZ-1002 5 QTcF Int. (msec) Visit 2 2016-03-06T09:56 400 Active… NA
#> 15 XYZ-1002 6 QTcF Int. (msec) Visit 3 2016-03-24T10:53 414 Active… NA
#> 16 XYZ-1002 7 QTcF Int. (msec) Visit 3 2016-03-24T10:56 402 Active… NA