Skip to contents

Adds LOCF records as new observations for each 'by group' when the dataset does not contain observations for missed visits/time points and when analysis value is missing.

Usage

derive_locf_records(
  dataset,
  dataset_ref,
  by_vars,
  id_vars_ref = NULL,
  analysis_var = AVAL,
  imputation = "add",
  order,
  keep_vars = NULL
)

Arguments

dataset

Input dataset

The variables specified by the by_vars, analysis_var, order, and keep_vars arguments are expected to be in the dataset.

Default value

none

dataset_ref

Expected observations dataset

Data frame with all the combinations of PARAMCD, PARAM, AVISIT, AVISITN, ... which are expected in the dataset is expected.

Default value

none

by_vars

Grouping variables

For each group defined by by_vars those observations from dataset_ref are added to the output dataset which do not have a corresponding observation in the input dataset or for which analysis_var is NA for the corresponding observation in the input dataset.

Default value

none

id_vars_ref

Grouping variables in expected observations dataset

The variables to group by in dataset_ref when determining which observations should be added to the input dataset.

Default value

All the variables in dataset_ref

analysis_var

Analysis variable.

Permitted values

a variable

Default value

AVAL

imputation

Select the mode of imputation:

add: Keep all original records and add imputed records for missing timepoints and missing analysis_var values from dataset_ref.

update: Update records with missing analysis_var and add imputed records for missing timepoints from dataset_ref.

update_add: Keep all original records, update records with missing analysis_var and add imputed records for missing timepoints from dataset_ref.

Permitted values

One of these 3 values: "add", "update", "update_add"

Default value

"add"

order

Sort order

The dataset is sorted by order before carrying the last observation forward (e.g. AVAL) within each by_vars.

For handling of NAs in sorting variables see Sort Order.

Default value

none

keep_vars

Variables that need carrying the last observation forward

Keep variables that need carrying the last observation forward other than analysis_var (e.g., PARAMN, VISITNUM). If by default NULL, only variables specified in by_vars and analysis_var will be populated in the newly created records.

Default value

NULL

Value

The input dataset with the new "LOCF" observations added for each by_vars, based on the value passed to the imputation argument.

Details

For each group (with respect to the variables specified for the by_vars parameter) those observations from dataset_ref are added to the output dataset

  • which do not have a corresponding observation in the input dataset or

  • for which analysis_var is NA for the corresponding observation in the input dataset.

    For the new observations, analysis_var is set to the non-missing analysis_var of the previous observation in the input dataset (when sorted by order) and DTYPE is set to "LOCF".

    The imputation argument decides whether to update the existing observation when analysis_var is NA ("update" and "update_add"), or to add a new observation from dataset_ref instead ("add").

Author

G Gayatri

Examples


library(dplyr)
library(tibble)

advs <- tribble(
  ~STUDYID,  ~USUBJID,      ~VSSEQ, ~PARAMCD, ~PARAMN, ~AVAL, ~AVISITN, ~AVISIT,
  "CDISC01", "01-701-1015",      1, "PULSE",        1,    65,        0, "BASELINE",
  "CDISC01", "01-701-1015",      2, "DIABP",        2,    79,        0, "BASELINE",
  "CDISC01", "01-701-1015",      3, "DIABP",        2,    80,        2, "WEEK 2",
  "CDISC01", "01-701-1015",      4, "DIABP",        2,    NA,        4, "WEEK 4",
  "CDISC01", "01-701-1015",      5, "DIABP",        2,    NA,        6, "WEEK 6",
  "CDISC01", "01-701-1015",      6, "SYSBP",        3,   130,        0, "BASELINE",
  "CDISC01", "01-701-1015",      7, "SYSBP",        3,   132,        2, "WEEK 2"
)

# A dataset with all the combinations of PARAMCD, PARAM, AVISIT, AVISITN, ...
# which are expected.
advs_expected_obsv <- tribble(
  ~PARAMCD, ~AVISITN, ~AVISIT,
  "PULSE",         0, "BASELINE",
  "PULSE",         6, "WEEK 6",
  "DIABP",         0, "BASELINE",
  "DIABP",         2, "WEEK 2",
  "DIABP",         4, "WEEK 4",
  "DIABP",         6, "WEEK 6",
  "SYSBP",         0, "BASELINE",
  "SYSBP",         2, "WEEK 2",
  "SYSBP",         4, "WEEK 4",
  "SYSBP",         6, "WEEK 6"
)

# Example 1: Add imputed records for missing timepoints and for missing
#            `analysis_var` values (from `dataset_ref`), keeping all the original records.
derive_locf_records(
  dataset = advs,
  dataset_ref = advs_expected_obsv,
  by_vars = exprs(STUDYID, USUBJID, PARAMCD),
  imputation = "add",
  order = exprs(AVISITN, AVISIT),
  keep_vars = exprs(PARAMN)
) |>
  arrange(USUBJID, PARAMCD, AVISIT)
#> # A tibble: 12 × 9
#>    STUDYID USUBJID     VSSEQ PARAMCD PARAMN  AVAL AVISITN AVISIT   DTYPE
#>    <chr>   <chr>       <dbl> <chr>    <dbl> <dbl>   <dbl> <chr>    <chr>
#>  1 CDISC01 01-701-1015     2 DIABP        2    79       0 BASELINE NA   
#>  2 CDISC01 01-701-1015     3 DIABP        2    80       2 WEEK 2   NA   
#>  3 CDISC01 01-701-1015    NA DIABP        2    80       4 WEEK 4   LOCF 
#>  4 CDISC01 01-701-1015     4 DIABP        2    NA       4 WEEK 4   NA   
#>  5 CDISC01 01-701-1015    NA DIABP        2    80       6 WEEK 6   LOCF 
#>  6 CDISC01 01-701-1015     5 DIABP        2    NA       6 WEEK 6   NA   
#>  7 CDISC01 01-701-1015     1 PULSE        1    65       0 BASELINE NA   
#>  8 CDISC01 01-701-1015    NA PULSE        1    65       6 WEEK 6   LOCF 
#>  9 CDISC01 01-701-1015     6 SYSBP        3   130       0 BASELINE NA   
#> 10 CDISC01 01-701-1015     7 SYSBP        3   132       2 WEEK 2   NA   
#> 11 CDISC01 01-701-1015    NA SYSBP        3   132       4 WEEK 4   LOCF 
#> 12 CDISC01 01-701-1015    NA SYSBP        3   132       6 WEEK 6   LOCF 


# Example 2: Add imputed records for missing timepoints (from `dataset_ref`)
#            and update missing `analysis_var` values.
derive_locf_records(
  dataset = advs,
  dataset_ref = advs_expected_obsv,
  by_vars = exprs(STUDYID, USUBJID, PARAMCD),
  imputation = "update",
  order = exprs(AVISITN, AVISIT),
) |>
  arrange(USUBJID, PARAMCD, AVISIT)
#> # A tibble: 10 × 9
#>    STUDYID USUBJID     VSSEQ PARAMCD PARAMN  AVAL AVISITN AVISIT   DTYPE
#>    <chr>   <chr>       <dbl> <chr>    <dbl> <dbl>   <dbl> <chr>    <chr>
#>  1 CDISC01 01-701-1015     2 DIABP        2    79       0 BASELINE NA   
#>  2 CDISC01 01-701-1015     3 DIABP        2    80       2 WEEK 2   NA   
#>  3 CDISC01 01-701-1015     4 DIABP        2    80       4 WEEK 4   LOCF 
#>  4 CDISC01 01-701-1015     5 DIABP        2    80       6 WEEK 6   LOCF 
#>  5 CDISC01 01-701-1015     1 PULSE        1    65       0 BASELINE NA   
#>  6 CDISC01 01-701-1015    NA PULSE       NA    65       6 WEEK 6   LOCF 
#>  7 CDISC01 01-701-1015     6 SYSBP        3   130       0 BASELINE NA   
#>  8 CDISC01 01-701-1015     7 SYSBP        3   132       2 WEEK 2   NA   
#>  9 CDISC01 01-701-1015    NA SYSBP       NA   132       4 WEEK 4   LOCF 
#> 10 CDISC01 01-701-1015    NA SYSBP       NA   132       6 WEEK 6   LOCF 


# Example 3: Add imputed records for missing timepoints (from `dataset_ref`) and
#            update missing `analysis_var` values, keeping all the original records.
derive_locf_records(
  dataset = advs,
  dataset_ref = advs_expected_obsv,
  by_vars = exprs(STUDYID, USUBJID, PARAMCD),
  imputation = "update_add",
  order = exprs(AVISITN, AVISIT),
) |>
  arrange(USUBJID, PARAMCD, AVISIT)
#> # A tibble: 12 × 9
#>    STUDYID USUBJID     VSSEQ PARAMCD PARAMN  AVAL AVISITN AVISIT   DTYPE
#>    <chr>   <chr>       <dbl> <chr>    <dbl> <dbl>   <dbl> <chr>    <chr>
#>  1 CDISC01 01-701-1015     2 DIABP        2    79       0 BASELINE NA   
#>  2 CDISC01 01-701-1015     3 DIABP        2    80       2 WEEK 2   NA   
#>  3 CDISC01 01-701-1015     4 DIABP        2    80       4 WEEK 4   LOCF 
#>  4 CDISC01 01-701-1015     4 DIABP        2    NA       4 WEEK 4   NA   
#>  5 CDISC01 01-701-1015     5 DIABP        2    80       6 WEEK 6   LOCF 
#>  6 CDISC01 01-701-1015     5 DIABP        2    NA       6 WEEK 6   NA   
#>  7 CDISC01 01-701-1015     1 PULSE        1    65       0 BASELINE NA   
#>  8 CDISC01 01-701-1015    NA PULSE       NA    65       6 WEEK 6   LOCF 
#>  9 CDISC01 01-701-1015     6 SYSBP        3   130       0 BASELINE NA   
#> 10 CDISC01 01-701-1015     7 SYSBP        3   132       2 WEEK 2   NA   
#> 11 CDISC01 01-701-1015    NA SYSBP       NA   132       4 WEEK 4   LOCF 
#> 12 CDISC01 01-701-1015    NA SYSBP       NA   132       6 WEEK 6   LOCF