Believe in a higher order!

A brief foray into the higher order functions in the {admiral} package.
ADaM
Technical
Author

Edoardo Mancini

Published

November 27, 2023

Introduction

Picture the following scenario:

You, a budding {admiral} programmer, are finding your groove chaining together modular code blocks to derive variables and parameters in a drive to construct your favorite ADaM dataset, ADAE. Suddenly you notice that one of the flags you are deriving should only use records on or after study day 1. In a moment of mild annoyance, you get to work modifying what was originally a simple call to derive_var_extreme_flag() by first subsetting ADAE to records where AESTDY > 1, then deriving the flag only for the subsetted ADAE, and finally binding the two portions of ADAE back together before continuing on with your program. Miffed by this interruption, you think to yourself: “I wish there was a neater, faster way to do this in stride, that didn’t break my code modularity…”

If the above could never be you, then you’ll probably be alright never reading this blog post. However, if you want to learn more about the tools that {admiral} provides to make your life easier in cases like this one, then you are in the right place, since this blog post will highlight how higher order functions can solve such issues.

A higher order function is a function that takes another function as input. By introducing these higher order functions, {admiral} intends to give the user greater power over derivations, whilst trying to negate the need for both adding additional {admiral} functions/arguments, and the user needing many separate steps.

The functions covered in this post are:

  • restrict_derivation(): Allows the user to execute a single derivation on a subset of the input dataset.
  • call_derivation(): Allows the user to call a single derivation multiple times with some arguments being fixed across iterations and others varying.
  • slice_derivation(): Allows the user to split the input dataset into slices (subsets) and for each slice a single derivation is called separately. Some or all arguments of the derivation may vary depending on the slice.

Required Packages

The examples in this blog post require the following packages.

library(admiral)
library(pharmaversesdtm)
library(dplyr, warn.conflicts = FALSE)

For example purpose, the ADSL dataset - which is included in {admiral} - and the SDTM datasets from {pharmaversesdtm} are used.

data("admiral_adsl")
data("ae")
data("vs")
adsl <- admiral_adsl
ae <- convert_blanks_to_na(ae)
vs <- convert_blanks_to_na(vs)

The following code creates a minimally viable ADAE dataset to be used where needed in the following examples.

adae <- ae %>%
  left_join(adsl, by = c("STUDYID", "USUBJID")) %>%
  derive_vars_dt(
    new_vars_prefix = "AST",
    dtc = AESTDTC,
    highest_imputation = "M"
  ) %>%
  mutate(
    TRTEMFL = if_else(ASTDT >= TRTSDT, "Y", NA_character_),
    TEMP_AESEVN = as.integer(factor(AESEV, levels = c("SEVERE", "MODERATE", "MILD")))
  )

Restrict Derivation

The idea behind restrict_derivation() is largely to solve the problem outlined in the introduction: sometimes one may want to easily apply a derivation only for certain records from the input dataset. restrict_derivation() gives the users the ability to achieve this across any {admiral} function, without each function needing to have such an argument to allow for this.

Putting this into practice with an example: suppose the user has some code flagging the first occurring AE with the highest severity for each patient:

adae_ahsevfl <- adae %>%
  derive_var_extreme_flag(
    new_var = AHSEVFL,
    by_vars = exprs(USUBJID),
    order = exprs(TEMP_AESEVN, AESTDY, AESEQ),
    mode = "first"
  )

To derive AHSEVFL for records occurring on or after study day 1, the user could try to split the dataset before applying derive_var_extreme_flag(), and then re-join everything at the end…

adae_pre_stdy1 <- adae %>% filter(AESTDY >= 1)
adae_post_stdy1 <- adae %>% filter(!(AESTDY >= 1))

adae_pre_stdy1_flag <- adae_pre_stdy1 %>%
  derive_var_extreme_flag(
    new_var = AHSEVFL,
    by_vars = exprs(USUBJID),
    order = exprs(TEMP_AESEVN, AESTDY, AESEQ),
    mode = "first"
  )

adae_ahsevfl <- adae_post_stdy1 %>%
  mutate(AHSEVFL = NA_character_) %>% # need to make AHSEVFL in this dataset too, to enable binding below
  rbind(adae_pre_stdy1_flag)

..or, restrict_derivation() could be wrapped around derive_var_extreme_flag(), using the following structure:

  • The function to restrict, derive_var_extreme_flag() is passed to restrict_derivation() through the derivation argument;
  • The arguments to derive_var_extreme_flag() are passed using a call to params();
  • The restriction criterion is provided using the filter argument.
adae_ahsevfl <- adae %>%
  restrict_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      new_var = AHSEVFL,
      by_vars = exprs(USUBJID),
      order = exprs(TEMP_AESEVN, AESTDY, AESEQ),
      mode = "first"
    ),
    filter = AESTDY >= 1
  )
USUBJID AEDECOD AESTDY AESEQ AESEV AHSEVFL
01-701-1111 LOCALISED INFECTION -61 3 MODERATE NA
01-701-1111 ERYTHEMA -5 1 MILD NA
01-701-1111 PRURITUS -5 2 MILD NA
01-701-1111 ERYTHEMA -5 4 MILD NA
01-701-1111 PRURITUS -5 5 MILD NA
01-701-1111 MICTURITION URGENCY 1 6 MILD NA
01-701-1111 ARTHRALGIA 7 7 MODERATE Y
01-701-1111 CELLULITIS 7 8 MODERATE NA
01-705-1393 PRURITUS -277 2 MILD NA
01-705-1393 PRURITUS -277 4 MILD NA

Though the ultimate result is the same, the second approach is often preferable as it allows everything to be achieved within one code block, meaning one doesn’t necessarily need to break the rhythm achieved when chaining multiple blocks together due to the requirement to “preprocess” the ADaM dataset by only keeping records relevant for the derivation.

Call Derivation

call_derivation() is a function that exists purely for convenience: it saves the user repeating numerous similar derivation function calls. It is best used when multiple derived variables have very similar specifications with only slight variations.

As an example, imagine the case where all the parameters in a BDS ADaM require both a highest value flag and a lowest value flag.

Here is an example of how to achieve this without using call_derivation():

vs_ahilofl <- vs %>%
  derive_var_extreme_flag(
    by_vars = exprs(USUBJID, VSTESTCD),
    order = exprs(VSORRES, VSSEQ),
    new_var = AHIFL,
    mode = "last"
  ) %>%
  derive_var_extreme_flag(
    by_vars = exprs(USUBJID, VSTESTCD),
    order = exprs(VSORRES, VSSEQ),
    new_var = ALOFL,
    mode = "first"
  )

Conversely, here is how to achieve the same objective by using call_derivation(). Any arguments differing across runs (such as the name of the new variable) are passed using params(), and again the function that needs to be repeatedly called is passed through the derivation argument.

vs_ahilofl <- vs %>%
  call_derivation(
    derivation = derive_var_extreme_flag,
    variable_params = list(
      params(new_var = AHIFL, mode = "last"),
      params(new_var = ALOFL, mode = "first")
    ),
    by_vars = exprs(USUBJID, VSTESTCD),
    order = exprs(VSORRES, VSSEQ)
  )
USUBJID VSTESTCD VSORRES ALOFL AHIFL
01-701-1015 TEMP 96.9 NA NA
01-701-1015 TEMP 97.0 NA NA
01-701-1015 TEMP 97.2 NA NA
01-701-1015 TEMP 96.6 Y NA
01-701-1015 TEMP 97.7 NA NA
01-701-1015 TEMP 97.0 NA NA
01-701-1015 TEMP 97.5 NA NA
01-701-1015 TEMP 97.4 NA NA
01-701-1015 TEMP 98.0 NA Y
01-701-1015 TEMP 97.4 NA NA

Notice that any arguments that stay the same across iterations (here, by_vars and order) are instead passed outside of variable_params. However, it is important to observe that although the arguments outside variable_params are invariant across derivation calls, if any such argument is also specified inside variable_params then this selection overrides the outside selection. This can be useful in cases where for most derivation calls, the set of invariant arguments is constant, but for one or two calls a small modification is required.

Clearly, the advantage of using call_derivation() instead of duplicating code blocks only grows as the number of variable derivations with similar needs also grows.

Slice Derivation

This function is essentially a combination of call_derivation() and restrict_derivation(), since it allows a single derivation to be applied with different arguments for different slices (subsets) of records from the input dataset. One could do this with separate restrict_derivation() calls for each different set of records, but slice_derivation() allows to achieve this in one call.

For instance, consider the case where one wanted to achieve a similar derivation to that in the restrict_derivation() example (flagging AE with the highest severity for each patient) but while for records occurring on or after study day 1 the intent remains to flag the first occurring AE, for pre-treatment AEs one instead targets the last occurring AE.

slice_derivation() comes to the rescue!

  • Once again, the function to restrict is passed through the derivation argument;
  • The arguments that remain constant across slices are passed in the args selection using a call to params();
  • The user passes derivation_slice’s to the function detailing the filter condition for the slice in the filter argument and what differs across runs in the args call.

Note: observations that match with more than one slice are only considered for the first matching slice. Moreover, observations with no match to any of the slices are included in the output dataset but the derivation is not called for them.

adae_ahsev2fl <- adae %>%
  slice_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      new_var = AHSEV2FL,
      by_vars = exprs(USUBJID)
    ),
    derivation_slice(
      filter = AESTDY >= 1,
      args = params(order = exprs(TEMP_AESEVN, AESTDY, AESEQ), mode = "first")
    ),
    derivation_slice(
      filter = TRUE,
      args = params(order = exprs(AESTDY, AESEQ), mode = "last")
    )
  )
USUBJID AEDECOD AESTDY AESEQ AESEV AHSEV2FL
01-701-1111 LOCALISED INFECTION -61 3 MODERATE NA
01-701-1111 ERYTHEMA -5 1 MILD NA
01-701-1111 PRURITUS -5 2 MILD NA
01-701-1111 ERYTHEMA -5 4 MILD NA
01-701-1111 PRURITUS -5 5 MILD Y
01-701-1111 MICTURITION URGENCY 1 6 MILD NA
01-701-1111 ARTHRALGIA 7 7 MODERATE Y
01-701-1111 CELLULITIS 7 8 MODERATE NA
01-705-1393 PRURITUS -277 2 MILD NA
01-705-1393 PRURITUS -277 4 MILD Y

Notice that the derivation_slice ordering is important. in the above examples, all the AEs on or after study day 1 were addressed first, and then the filter = TRUE option was employed to catch all remaining records (in this case pre-treatment AEs).

The ordering is perhaps shown even more when in the below example where three slices are taken. Remember that observations that match with more than one slice are only considered for the first matching slice. Thus, in this case the objective is to create a flag for each patient for the record with the first severe AE, and then the first moderate AE, and finally the last occurring AE which is neither severe or moderate.

adae_ahsev3fl <- adae %>%
  slice_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      new_var = AHSEV3FL,
      by_vars = exprs(USUBJID)
    ),
    derivation_slice(
      filter = AESEV == "SEVERE",
      args = params(order = exprs(AESTDY, AESEQ), mode = "first")
    ),
    derivation_slice(
      filter = AESEV == "MODERATE",
      args = params(order = exprs(AESTDY, AESEQ), mode = "first")
    ),
    derivation_slice(
      filter = TRUE,
      args = params(order = exprs(AESTDY, AESEQ), mode = "last")
    )
  )
USUBJID AEDECOD AESTDY AESEQ AESEV AHSEV3FL
01-701-1111 LOCALISED INFECTION -61 3 MODERATE Y
01-701-1111 ERYTHEMA -5 1 MILD NA
01-701-1111 PRURITUS -5 2 MILD NA
01-701-1111 ERYTHEMA -5 4 MILD NA
01-701-1111 PRURITUS -5 5 MILD NA
01-701-1111 MICTURITION URGENCY 1 6 MILD Y
01-701-1111 ARTHRALGIA 7 7 MODERATE NA
01-701-1111 CELLULITIS 7 8 MODERATE NA
01-705-1393 PRURITUS -277 2 MILD NA
01-705-1393 PRURITUS -277 4 MILD NA

The order is only important when the slices are not mutually exclusive, so in the above case the moderate AE slice could have been above the severe AE slice, for example, and there would have been no difference to the result. However the third slice had to come last to check all remaining (i.e. not severe or moderate) records only.

Conclusion

The three higher order functions available in {admiral} restrict_derivation(), call_derivation() and slice_derivation(), are a flexible toolset provided by {admiral} to streamline ADaM code. They are never the only way to achieve a derivation, but they are often the most efficient way to do so. When code becomes long or convoluted, it is often worth pausing to examine whether one of these could come to the rescue to make life simpler.

Last updated

2024-11-15 19:39:51.846749

Details

Reuse

Citation

BibTeX citation:
@online{mancini2023,
  author = {Mancini, Edoardo},
  title = {Believe in a Higher Order!},
  date = {2023-11-27},
  url = {https://pharmaverse.github.io/blog/posts/2023-11-27_higher_order/higher_order.html},
  langid = {en}
}
For attribution, please cite this work as:
Mancini, Edoardo. 2023. “Believe in a Higher Order!” November 27, 2023. https://pharmaverse.github.io/blog/posts/2023-11-27_higher_order/higher_order.html.