Date/Time Functions and Imputation in {admiral}

Dates, times and imputation can be a frustrating facet of any programming language. Here’s how {admiral} makes all of this easy!
ADaM
Technical
Author

Edoardo Mancini

Published

September 26, 2023

Introduction

Date and time is collected in SDTM as character values using the extended ISO 8601 format yyyy-dd-mmThh:mm:ss. This universal format allows missing parts date or time - e.g. the string"2019-10" represents a date where the day and the time are unknown. In contrast, ADaM timing variables like ADTM (Analysis Datetime) or ADY (Analysis Relative Day) are numeric variables, which can be derived only if the date or datetime is complete.

Most ADaM programmers have, at one point or another, encountered situations where missing dates, unexpected formats or confusing imputation functions rendered derivations of timing variables frustrating and time consuming. {admiral} aims to mitigate this (where possible!) by providing functions which automatically derive date/datetime variables for you, and fill in missing date or time parts according to well-defined imputation rules.

In this article, we first examine the arsenal of functions provided by{admiral} to aid in datetime imputation and timing variable derivation. We then observe everything in action through a number of selected typical examples.

Date/Datetime Derivation and Imputation Functions

{admiral} provides the following functions for date/datetime imputation:

  • Derivations for adding variables
    • derive_vars_dt(): Adds a date variable and a date imputation flag variable (optional) based on a –DTC variable and imputation rules.
    • derive_vars_dtm(): Adds a datetime variable, a date imputation flag variable, and a time imputation flag variable (both optional) based on a –DTC variable and imputation rules.
  • Computation functions
    • impute_dtc_dtm(): Returns a complete ISO 8601 datetime or NA based on a partial ISO 8601 datetime and imputation rules.
    • impute_dtc_dt(): Returns a complete ISO 8601 date (without time) or NA based on a partial ISO 8601 date(time) and imputation rules.
    • convert_dtc_to_dt(): Returns a date if the input ISO 8601 date is complete. Otherwise, NA is returned.
    • convert_dtc_to_dtm(): Returns a datetime if the input ISO 8601 date is complete (with missing time replaced by "00:00:00" as default). Otherwise, NA is returned.
    • compute_dtf(): Returns the date imputation flag.
    • compute_tmf(): Returns the time imputation flag.

From the point of view of a typical ADaM programmer, the functions impute_*, convert_* and compute_* above can be viewed as utilities for treating dates and/or imputation within any custom code. In contrast, their derive_* find their use in directly deriving new timing variables and/or carrying out imputation at an ADaM dataset scale.

For a detailed look at the Imputation rules applied by these {admiral} functions, please visit this vignette on the documentation website.

Simple Examples with Vectors

In the examples below, one can observe how some members of the class of utilities impute_*() and convert_*() can be employed to do the date-related heavy lifting.

Imputing a Partial Date Portion

It is easy impute dates to the first day/month if they are partial just by using the highest_imputation argument:

library(admiral)
library(lubridate)
library(tibble)
library(dplyr, warn.conflicts = FALSE)

dates <- c(
  "2019-07-18T15:25:40",
  "2019-07-18T15:25",
  "2019-07-18T15",
  "2019-07-18",
  "2019-02",
  "2019",
  "2019",
  "2019---07",
  ""
)

impute_dtc_dt(
  dtc = dates,
  highest_imputation = "M"
)
[1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-01"
[6] "2019-01-01" "2019-01-01" "2019-01-01" NA          

A simple modification using date_imputation = "mid" or date_imputation = "last" or enables the imputation to be made using the middle or last day/month instead:

# Impute to last day/month if date is partial
impute_dtc_dt(
  dtc = dates,
  highest_imputation = "M",
  date_imputation = "last",
)
[1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-28"
[6] "2019-12-31" "2019-12-31" "2019-12-31" NA          
# Impute to mid day/month if date is partial
impute_dtc_dt(
  dtc = dates,
  highest_imputation = "M",
  date_imputation = "mid"
)
[1] "2019-07-18" "2019-07-18" "2019-07-18" "2019-07-18" "2019-02-15"
[6] "2019-06-30" "2019-06-30" "2019-06-30" NA          

But what if there exist minimum dates that the imputed date cannot exceed? Here, the min_date argument comes to the rescue:

impute_dtc_dt(
  "2020-12",
  min_dates = list(
    ymd("2020-12-06"),
    ymd("2020-11-11")
  ),
  highest_imputation = "M"
)
[1] "2020-12-06"

Computing Date Imputation Flags

When it comes to carrying out an imputation, the twin task is to flag the type of imputation that was executed. Here, functions like compute_dtf() make this straightforward. For this function, all that needs to be done is to pass a date character date to the dtc argument, and the resulting imputed date to the dt argument. This will then return the right date imputation flag - see the examples below for some possible behaviors:

compute_dtf(dtc = "2019-07", dt = as.Date("2019-07-18"))
[1] "D"
compute_dtf(dtc = "2019", dt = as.Date("2019-07-18"))
[1] "M"
compute_dtf(dtc = "--06-01T00:00", dt = as.Date("2022-06-01"))
[1] "Y"

Action Examples

The derive_*() functions are essentially wrappers around the aforementioned impute_*() and compute_*() functions. In the following section, we explore examples where ADaM variables can be derived using this class of functions.

Creating an Imputed Datetime and Date Variable and Imputation Flag Variables

As described previously, derive_vars_dtm() derives an imputed datetime variable and the corresponding date and time imputation flags. The imputed date variable can then be derived by using derive_vars_dtm_to_dt(). It is not necessary and advisable to perform the imputation for the date variable if it was already done for the datetime variable. CDISC considers the datetime and the date variable as two representations of the same date. Thus the imputation must be the same and the imputation flags are valid for both the datetime and the date variable.

ae <- tribble(
  ~AESTDTC,
  "2019-08-09T12:34:56",
  "2019-04-12",
  "2010-09",
  NA_character_
) %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    date_imputation = "first",
    time_imputation = "first"
  ) %>%
  derive_vars_dtm_to_dt(exprs(ASTDTM))
AESTDTC ASTDTM ASTDTF ASTTMF ASTDT
2019-08-09T12:34:56 2019-08-09 12:34:56 NA NA 2019-08-09
2019-04-12 2019-04-12 00:00:00 NA H 2019-04-12
2010-09 2010-09-01 00:00:00 D H 2010-09-01
NA NA NA NA NA

Creating an Imputed Date Variable and Imputation Flag Variable

If an imputed date variable without a corresponding datetime variable is required, it can be derived by the derive_vars_dt() function.

ae <- tribble(
  ~AESTDTC,
  "2019-08-09T12:34:56",
  "2019-04-12",
  "2010-09",
  NA_character_
) %>%
  derive_vars_dt(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    date_imputation = "first"
  )
AESTDTC ASTDT ASTDTF
2019-08-09T12:34:56 2019-08-09 NA
2019-04-12 2019-04-12 NA
2010-09 2010-09-01 D
NA NA NA

Imputing Time Without Imputing Date

If the time should be imputed but not the date, the highest_imputation argument should be set to "h". This results in NA if the date is partial. As no date is imputed the date imputation flag is not created.

ae <- tribble(
  ~AESTDTC,
  "2019-08-09T12:34:56",
  "2019-04-12",
  "2010-09",
  NA_character_
) %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "h",
    time_imputation = "first"
  )
AESTDTC ASTDTM ASTTMF
2019-08-09T12:34:56 2019-08-09 12:34:56 NA
2019-04-12 2019-04-12 00:00:00 H
2010-09 NA NA
NA NA NA

Avoiding Imputed Dates Before a Particular Date

Usually an adverse event start date is imputed as the earliest date of all possible dates when filling the missing parts. The result may be a date before treatment start date. This is not desirable because the adverse event would not be considered as treatment emergent and excluded from the adverse event summaries. This can be avoided by specifying the treatment start date variable (TRTSDTM) for the min_dates argument.

Importantly, TRTSDTM is used as imputed date only if the non missing date and time parts of AESTDTC coincide with those of TRTSDTM. Therefore 2019-10 is not imputed as 2019-11-11 12:34:56. This ensures that collected information is not changed by the imputation.

ae <- tribble(
  ~AESTDTC,              ~TRTSDTM,
  "2019-08-09T12:34:56", ymd_hms("2019-11-11T12:34:56"),
  "2019-10",             ymd_hms("2019-11-11T12:34:56"),
  "2019-11",             ymd_hms("2019-11-11T12:34:56"),
  "2019-12-04",          ymd_hms("2019-11-11T12:34:56")
) %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    date_imputation = "first",
    time_imputation = "first",
    min_dates = exprs(TRTSDTM)
  )
AESTDTC TRTSDTM ASTDTM ASTDTF ASTTMF
2019-08-09T12:34:56 2019-11-11 12:34:56 2019-08-09 12:34:56 NA NA
2019-10 2019-11-11 12:34:56 2019-10-01 00:00:00 D H
2019-11 2019-11-11 12:34:56 2019-11-11 12:34:56 D H
2019-12-04 2019-11-11 12:34:56 2019-12-04 00:00:00 NA H

Avoiding Imputed Dates After a Particular Date

If a date is imputed as the latest date of all possible dates when filling the missing parts, it should not result in dates after data cut off or death. This can be achieved by specifying the dates for the max_dates argument.

Importantly, non missing date parts are not changed. Thus 2019-12-04 is imputed as 2019-12-04 23:59:59 although it is after the data cut off date. It may make sense to replace it by the data cut off date but this is not part of the imputation. It should be done in a separate data cleaning or data cut off step.

ae <- tribble(
  ~AEENDTC,              ~DTHDT,            ~DCUTDT,
  "2019-08-09T12:34:56", ymd("2019-11-11"), ymd("2019-12-02"),
  "2019-11",             ymd("2019-11-11"), ymd("2019-12-02"),
  "2019-12",             NA,                ymd("2019-12-02"),
  "2019-12-04",          NA,                ymd("2019-12-02")
) %>%
  derive_vars_dtm(
    dtc = AEENDTC,
    new_vars_prefix = "AEN",
    highest_imputation = "M",
    date_imputation = "last",
    time_imputation = "last",
    max_dates = exprs(DTHDT, DCUTDT)
  )
AEENDTC DTHDT DCUTDT AENDTM AENDTF AENTMF
2019-08-09T12:34:56 2019-11-11 2019-12-02 2019-08-09 12:34:56 NA NA
2019-11 2019-11-11 2019-12-02 2019-11-11 23:59:59 D H
2019-12 NA 2019-12-02 2019-12-02 23:59:59 D H
2019-12-04 NA 2019-12-02 2019-12-04 23:59:59 NA H

Imputation Without Creating a New Variable

If imputation is required without creating a new variable the convert_dtc_to_dt() function can be called to obtain a vector of imputed dates. It can be used for example here:

mh <- tribble(
  ~MHSTDTC,     ~TRTSDT,
  "2019-04",    ymd("2019-04-15"),
  "2019-04-01", ymd("2019-04-15"),
  "2019-05",    ymd("2019-04-15"),
  "2019-06-21", ymd("2019-04-15")
) %>%
  filter(
    convert_dtc_to_dt(
      MHSTDTC,
      highest_imputation = "M",
      date_imputation = "first"
    ) < TRTSDT
  )
MHSTDTC TRTSDT
2019-04 2019-04-15
2019-04-01 2019-04-15

Using More Than One Imputation Rule for a Variable

Using different imputation rules depending on the observation can be done by using the higher-order function slice_derivation(), which applies a derivation function differently (by varying its arguments) in different subsections of a dataset. For example, consider this Vital Signs case where pre-dose records require a different treatment to other records:

vs <- tribble(
  ~VSDTC,                ~VSTPT,
  "2019-08-09T12:34:56", NA,
  "2019-10-12",          "PRE-DOSE",
  "2019-11-10",          NA,
  "2019-12-04",          NA
) %>%
  slice_derivation(
    derivation = derive_vars_dtm,
    args = params(
      dtc = VSDTC,
      new_vars_prefix = "A"
    ),
    derivation_slice(
      filter = VSTPT == "PRE-DOSE",
      args = params(time_imputation = "first")
    ),
    derivation_slice(
      filter = TRUE,
      args = params(time_imputation = "last")
    )
  )
VSDTC VSTPT ADTM ATMF
2019-08-09T12:34:56 NA 2019-08-09 12:34:56 NA
2019-11-10 NA 2019-11-10 23:59:59 H
2019-12-04 NA 2019-12-04 23:59:59 H
2019-10-12 PRE-DOSE 2019-10-12 00:00:00 H

Conclusion

Deriving timing variables and carrying out imputations is tricky at the best of times, but hopefully this blog post can shed some light on how make this all easier using the {admiral} package! As {admiral} developers we are always interested in knowing how users are employing the package for their ADaM needs, so if you have any comments or feedback related to this topic, don’t be afraid to leave a comment on our Slack channel or on the Github repository, either as an issue or as a discussion.

For an even more detailed treatment of this topic, users are once again invited to read the corresponding vignette on the documentation website, from which this article was adapted.

Reuse

Citation

BibTeX citation:
@online{mancini2023,
  author = {Mancini, Edoardo},
  title = {Date/Time {Functions} and {Imputation} in \{Admiral\}},
  date = {2023-09-26},
  url = {https://pharmaverse.github.io/blog/posts/2023-09-26_date_functions_and_imputation/date_functions_and_imputation.html},
  langid = {en}
}
For attribution, please cite this work as:
Mancini, Edoardo. 2023. “Date/Time Functions and Imputation in {Admiral} .” September 26, 2023. https://pharmaverse.github.io/blog/posts/2023-09-26_date_functions_and_imputation/date_functions_and_imputation.html.