Skip to contents

Derive Categorization Variables Like AVALCATy and AVALCAyN

Usage

derive_vars_cat(dataset, definition, by_vars = NULL)

Arguments

dataset

Input dataset

The variables specified by the by_vars and definition arguments are expected to be in the dataset.

definition

List of expressions created by exprs(). Must be in rectangular format and specified using the same syntax as when creating a tibble using the tribble() function. The definition object will be converted to a tibble using tribble() inside this function.

Must contain:

  • the column condition which will be converted to a logical expression and will be used on the dataset input.

  • at least one additional column with the new column name and the category value(s) used by the logical expression.

  • the column specified in by_vars (if by_vars is specified)

e.g. if by_vars is not specified:

exprs(~condition,   ~AVALCAT1, ~AVALCA1N,
      AVAL >= 140, ">=140 cm",         1,
      AVAL < 140,   "<140 cm",         2)

e.g. if by_vars is specified as exprs(VSTEST):

exprs(~VSTEST,   ~condition,  ~AVALCAT1, ~AVALCA1N,
      "Height", AVAL >= 140, ">=140 cm",         1,
      "Height",  AVAL < 140,  "<140 cm",         2)

by_vars

list of expressions with one element. NULL by default. Allows for specifying by groups, e.g. exprs(PARAMCD). Variable must be present in both dataset and definition. The conditions in definition are applied only to those records that match by_vars. The categorization variables are set to NA for records not matching any of the by groups in definition.

Value

The input dataset with the new variables defined in definition added

Details

If conditions are overlapping, the row order of definitions must be carefully considered. The first match will determine the category. i.e. if

AVAL = 155

and the definition is:

definition <- exprs(
  ~VSTEST,   ~condition,  ~AVALCAT1, ~AVALCA1N,
  "Height",  AVAL > 170,  ">170 cm",         1,
  "Height", AVAL <= 170, "<=170 cm",         2,
  "Height", AVAL <= 160, "<=160 cm",         3
)

then AVALCAT1 will be "<=170 cm", as this is the first match for AVAL. If you specify:

definition <- exprs(
  ~VSTEST,   ~condition,  ~AVALCAT1, ~AVALCA1N,
  "Height", AVAL <= 160, "<=160 cm",         3,
  "Height", AVAL <= 170, "<=170 cm",         2,
  "Height",  AVAL > 170,  ">170 cm",         1
)

Then AVAL <= 160 will lead to AVALCAT1 == "<=160 cm", AVAL in-between 160 and 170 will lead to AVALCAT1 == "<=170 cm", and AVAL <= 170 will lead to AVALCAT1 == ">170 cm".

However, we suggest to be more explicit when defining the condition, to avoid overlap. In this case, the middle condition should be: AVAL <= 170 & AVAL > 160

Examples

library(dplyr)
library(tibble)

advs <- tibble::tribble(
  ~USUBJID,       ~VSTEST,  ~AVAL,
  "01-701-1015", "Height", 147.32,
  "01-701-1015", "Weight",  53.98,
  "01-701-1023", "Height", 162.56,
  "01-701-1023", "Weight",     NA,
  "01-701-1028", "Height",     NA,
  "01-701-1028", "Weight",     NA,
  "01-701-1033", "Height", 175.26,
  "01-701-1033", "Weight",  88.45
)

definition <- exprs(
  ~condition,                        ~AVALCAT1, ~AVALCA1N,  ~NEWCOL,
  VSTEST == "Height" & AVAL > 160,   ">160 cm",         1, "extra1",
  VSTEST == "Height" & AVAL <= 160, "<=160 cm",         2, "extra2"
)
derive_vars_cat(
  dataset = advs,
  definition = definition
)
#> # A tibble: 8 × 6
#>   USUBJID     VSTEST  AVAL AVALCAT1 AVALCA1N NEWCOL
#>   <chr>       <chr>  <dbl> <chr>       <dbl> <chr> 
#> 1 01-701-1015 Height 147.  <=160 cm        2 extra2
#> 2 01-701-1015 Weight  54.0 NA             NA NA    
#> 3 01-701-1023 Height 163.  >160 cm         1 extra1
#> 4 01-701-1023 Weight  NA   NA             NA NA    
#> 5 01-701-1028 Height  NA   NA             NA NA    
#> 6 01-701-1028 Weight  NA   NA             NA NA    
#> 7 01-701-1033 Height 175.  >160 cm         1 extra1
#> 8 01-701-1033 Weight  88.4 NA             NA NA    

# Using by_vars:
definition2 <- exprs(
  ~VSTEST,   ~condition,  ~AVALCAT1, ~AVALCA1N,
  "Height",  AVAL > 160,  ">160 cm",         1,
  "Height", AVAL <= 160, "<=160 cm",         2,
  "Weight",   AVAL > 70,   ">70 kg",         1,
  "Weight",  AVAL <= 70,  "<=70 kg",         2
)

derive_vars_cat(
  dataset = advs,
  definition = definition2,
  by_vars = exprs(VSTEST)
)
#> # A tibble: 8 × 5
#>   USUBJID     VSTEST  AVAL AVALCAT1 AVALCA1N
#>   <chr>       <chr>  <dbl> <chr>       <dbl>
#> 1 01-701-1015 Height 147.  <=160 cm        2
#> 2 01-701-1015 Weight  54.0 <=70 kg         2
#> 3 01-701-1023 Height 163.  >160 cm         1
#> 4 01-701-1023 Weight  NA   NA             NA
#> 5 01-701-1028 Height  NA   NA             NA
#> 6 01-701-1028 Weight  NA   NA             NA
#> 7 01-701-1033 Height 175.  >160 cm         1
#> 8 01-701-1033 Weight  88.4 >70 kg          1

# With three conditions:
definition3 <- exprs(
  ~VSTEST,                ~condition,  ~AVALCAT1, ~AVALCA1N,
  "Height",               AVAL > 170,  ">170 cm",         1,
  "Height", AVAL <= 170 & AVAL > 160, "<=170 cm",         2,
  "Height",              AVAL <= 160, "<=160 cm",         3
)

derive_vars_cat(
  dataset = advs,
  definition = definition3,
  by_vars = exprs(VSTEST)
)
#> # A tibble: 8 × 5
#>   USUBJID     VSTEST  AVAL AVALCAT1 AVALCA1N
#>   <chr>       <chr>  <dbl> <chr>       <dbl>
#> 1 01-701-1015 Height 147.  <=160 cm        3
#> 2 01-701-1015 Weight  54.0 NA             NA
#> 3 01-701-1023 Height 163.  <=170 cm        2
#> 4 01-701-1023 Weight  NA   NA             NA
#> 5 01-701-1028 Height  NA   NA             NA
#> 6 01-701-1028 Weight  NA   NA             NA
#> 7 01-701-1033 Height 175.  >170 cm         1
#> 8 01-701-1033 Weight  88.4 NA             NA

# Let's derive both the MCRITyML and the MCRITyMN variables
adlb <- tibble::tribble(
  ~USUBJID,     ~PARAM, ~AVAL, ~AVALU,  ~ANRHI,
  "01-701-1015", "ALT",   150,  "U/L",      40,
  "01-701-1023", "ALT",    70,  "U/L",      40,
  "01-701-1036", "ALT",   130,  "U/L",      40,
  "01-701-1048", "ALT",    30,  "U/L",      40,
  "01-701-1015", "AST",    50,  "U/L",      35
)

definition_mcrit <- exprs(
  ~PARAM,                      ~condition,    ~MCRIT1ML, ~MCRIT1MN,
  "ALT",                    AVAL <= ANRHI,    "<=ANRHI",         1,
  "ALT", ANRHI < AVAL & AVAL <= 3 * ANRHI, ">1-3*ANRHI",         2,
  "ALT",                 3 * ANRHI < AVAL,   ">3*ANRHI",         3
)

adlb %>%
  derive_vars_cat(
    definition = definition_mcrit,
    by_vars = exprs(PARAM)
  )
#> # A tibble: 5 × 7
#>   USUBJID     PARAM  AVAL AVALU ANRHI MCRIT1ML   MCRIT1MN
#>   <chr>       <chr> <dbl> <chr> <dbl> <chr>         <dbl>
#> 1 01-701-1015 ALT     150 U/L      40 >3*ANRHI          3
#> 2 01-701-1023 ALT      70 U/L      40 >1-3*ANRHI        2
#> 3 01-701-1036 ALT     130 U/L      40 >3*ANRHI          3
#> 4 01-701-1048 ALT      30 U/L      40 <=ANRHI           1
#> 5 01-701-1015 AST      50 U/L      35 NA               NA