
Add New Records Within By Groups Using Aggregation Functions
Source:R/derive_summary_records.R
derive_summary_records.RdIt is not uncommon to have an analysis need whereby one needs to derive an
analysis value (AVAL) from multiple records. The ADaM basic dataset
structure variable DTYPE is available to indicate when a new derived
records has been added to a dataset.
Usage
derive_summary_records(
dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars,
filter_add = NULL,
set_values_to,
missing_values = NULL
)Arguments
- dataset
-
Input dataset
The variables specified by the
by_varsargument are expected to be in the dataset. - dataset_add
-
Additional dataset
The variables specified for
by_varsare expected. Observations from the specified dataset are going to be used to calculate and added as new records to the input dataset (dataset). - dataset_ref
-
Reference dataset
The variables specified for
by_varsare expected. For each observation of the specified dataset a new observation is added to the input dataset. - by_vars
-
Grouping variables
Variables to consider for generation of groupwise summary records. Providing the names of variables in
exprs()will create a groupwise summary and generate summary records for the specified groups.Permitted Values: list of variables created by
exprs()e.g.exprs(USUBJID, VISIT) - filter_add
-
Filter condition as logical expression to apply during summary calculation. By default, filtering expressions are computed within
by_varsas this will help when an aggregating, lagging, or ranking function is involved.For example,
filter_add = (AVAL > mean(AVAL, na.rm = TRUE))will filter allAVALvalues greater than mean ofAVALwith inby_vars.filter_add = (dplyr::n() > 2)will filter n count ofby_varsgreater than 2.
- set_values_to
-
Variables to be set
The specified variables are set to the specified values for the new observations.
Set a list of variables to some specified value for the new records
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value, an expression or NA. If summary functions are used, the values are summarized by the variables specified for
by_vars.
For example:
- missing_values
-
Values for missing summary values
For observations of the reference dataset (
dataset_ref) which do not have a complete mapping defined by the summarization defined inset_values_to. Only variables specified forset_values_tocan be specified formissing_values.Permitted Values: named list of expressions, e.g.,
exprs(AVAL = -9999)
Details
When all records have same values within by_vars then this function will
retain those common values in the newly derived records. Otherwise new value
will be set to NA.
See also
get_summary_records(), derive_var_merged_summary()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd(),
derive_expected_records(),
derive_extreme_event(),
derive_extreme_records(),
derive_locf_records(),
derive_param_bmi(),
derive_param_bsa(),
derive_param_computed(),
derive_param_doseint(),
derive_param_exist_flag(),
derive_param_exposure(),
derive_param_framingham(),
derive_param_map(),
derive_param_qtc(),
derive_param_rr(),
derive_param_wbc_abs()
Examples
library(tibble)
library(dplyr)
adeg <- tribble(
~USUBJID, ~EGSEQ, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL, ~TRTA,
"XYZ-1001", 1, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385, NA_character_,
"XYZ-1001", 2, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399, NA_character_,
"XYZ-1001", 3, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396, NA_character_,
"XYZ-1001", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:45", 384, "Placebo",
"XYZ-1001", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393, "Placebo",
"XYZ-1001", 6, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388, "Placebo",
"XYZ-1001", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:45", 385, "Placebo",
"XYZ-1001", 8, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394, "Placebo",
"XYZ-1001", 9, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402, "Placebo",
"XYZ-1002", 1, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399, NA_character_,
"XYZ-1002", 2, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 410, NA_character_,
"XYZ-1002", 3, "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392, NA_character_,
"XYZ-1002", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:50", 401, "Active 20mg",
"XYZ-1002", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:53", 407, "Active 20mg",
"XYZ-1002", 6, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:56", 400, "Active 20mg",
"XYZ-1002", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:50", 412, "Active 20mg",
"XYZ-1002", 8, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414, "Active 20mg",
"XYZ-1002", 9, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402, "Active 20mg"
) %>%
mutate(
ADTM = convert_dtc_to_dtm(EGDTC)
)
# Summarize the average of the triplicate ECG interval values (AVAL)
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 24 × 9
#> USUBJID EGSEQ PARAM AVISIT EGDTC AVAL TRTA ADTM DTYPE
#> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dttm> <chr>
#> 1 XYZ-1001 1 QTcF Int. … Basel… 2016… 385 NA 2016-02-24 07:50:00 NA
#> 2 XYZ-1001 2 QTcF Int. … Basel… 2016… 399 NA 2016-02-24 07:52:00 NA
#> 3 XYZ-1001 3 QTcF Int. … Basel… 2016… 396 NA 2016-02-24 07:56:00 NA
#> 4 XYZ-1001 NA QTcF Int. … Basel… NA 393. NA NA AVER…
#> 5 XYZ-1001 4 QTcF Int. … Visit… 2016… 384 Plac… 2016-03-08 09:45:00 NA
#> 6 XYZ-1001 5 QTcF Int. … Visit… 2016… 393 Plac… 2016-03-08 09:48:00 NA
#> 7 XYZ-1001 6 QTcF Int. … Visit… 2016… 388 Plac… 2016-03-08 09:51:00 NA
#> 8 XYZ-1001 NA QTcF Int. … Visit… NA 388. NA NA AVER…
#> 9 XYZ-1001 7 QTcF Int. … Visit… 2016… 385 Plac… 2016-03-22 10:45:00 NA
#> 10 XYZ-1001 8 QTcF Int. … Visit… 2016… 394 Plac… 2016-03-22 10:48:00 NA
#> # ℹ 14 more rows
# Derive more than one summary variable
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
set_values_to = exprs(
AVAL = mean(AVAL),
ADTM = max(ADTM),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT) %>%
select(-EGSEQ, -TRTA)
#> # A tibble: 24 × 7
#> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE
#> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 NA
#> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 NA
#> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 NA
#> 4 XYZ-1001 QTcF Int. (msec) Baseline NA 393. 2016-02-24 07:56:00 AVER…
#> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 384 2016-03-08 09:45:00 NA
#> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 NA
#> 7 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 NA
#> 8 XYZ-1001 QTcF Int. (msec) Visit 2 NA 388. 2016-03-08 09:51:00 AVER…
#> 9 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 385 2016-03-22 10:45:00 NA
#> 10 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 NA
#> # ℹ 14 more rows
# Sample ADEG dataset with triplicate record for only AVISIT = 'Baseline'
adeg <- tribble(
~USUBJID, ~EGSEQ, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL, ~TRTA,
"XYZ-1001", 1, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385, NA_character_,
"XYZ-1001", 2, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399, NA_character_,
"XYZ-1001", 3, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396, NA_character_,
"XYZ-1001", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393, "Placebo",
"XYZ-1001", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388, "Placebo",
"XYZ-1001", 6, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394, "Placebo",
"XYZ-1001", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402, "Placebo",
"XYZ-1002", 1, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399, NA_character_,
"XYZ-1002", 2, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 410, NA_character_,
"XYZ-1002", 3, "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392, NA_character_,
"XYZ-1002", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:53", 407, "Active 20mg",
"XYZ-1002", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:56", 400, "Active 20mg",
"XYZ-1002", 6, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414, "Active 20mg",
"XYZ-1002", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402, "Active 20mg"
)
# Compute the average of AVAL only if there are more than 2 records within the
# by group
derive_summary_records(
adeg,
dataset_add = adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
filter_add = n() > 2,
set_values_to = exprs(
AVAL = mean(AVAL, na.rm = TRUE),
DTYPE = "AVERAGE"
)
) %>%
arrange(USUBJID, AVISIT)
#> # A tibble: 16 × 8
#> USUBJID EGSEQ PARAM AVISIT EGDTC AVAL TRTA DTYPE
#> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 XYZ-1001 1 QTcF Int. (msec) Baseline 2016-02-24T07:50 385 NA NA
#> 2 XYZ-1001 2 QTcF Int. (msec) Baseline 2016-02-24T07:52 399 NA NA
#> 3 XYZ-1001 3 QTcF Int. (msec) Baseline 2016-02-24T07:56 396 NA NA
#> 4 XYZ-1001 NA QTcF Int. (msec) Baseline NA 393. NA AVER…
#> 5 XYZ-1001 4 QTcF Int. (msec) Visit 2 2016-03-08T09:48 393 Placebo NA
#> 6 XYZ-1001 5 QTcF Int. (msec) Visit 2 2016-03-08T09:51 388 Placebo NA
#> 7 XYZ-1001 6 QTcF Int. (msec) Visit 3 2016-03-22T10:48 394 Placebo NA
#> 8 XYZ-1001 7 QTcF Int. (msec) Visit 3 2016-03-22T10:51 402 Placebo NA
#> 9 XYZ-1002 1 QTcF Int. (msec) Baseline 2016-02-22T07:58 399 NA NA
#> 10 XYZ-1002 2 QTcF Int. (msec) Baseline 2016-02-22T07:58 410 NA NA
#> 11 XYZ-1002 3 QTcF Int. (msec) Baseline 2016-02-22T08:01 392 NA NA
#> 12 XYZ-1002 NA QTcF Int. (msec) Baseline NA 400. NA AVER…
#> 13 XYZ-1002 4 QTcF Int. (msec) Visit 2 2016-03-06T09:53 407 Active… NA
#> 14 XYZ-1002 5 QTcF Int. (msec) Visit 2 2016-03-06T09:56 400 Active… NA
#> 15 XYZ-1002 6 QTcF Int. (msec) Visit 3 2016-03-24T10:53 414 Active… NA
#> 16 XYZ-1002 7 QTcF Int. (msec) Visit 3 2016-03-24T10:56 402 Active… NA