pharmaverse examples
  1. ADaM
  2. ADAE
  • Introduction

  • SDTM
    • SDTM Examples
  • ADaM
    • ADSL
    • ADPC
    • ADPPK
    • ADRS
    • ADTTE
    • ADVS
    • ADAE
  • TLG
    • Demographic Table
    • Adverse Events
    • Pharmacokinetic
  • Interactive
    • teal applications
  • Logs
    • The Difference Between logr, logrx, and whirl
  • eSubmission
    • eSubmission

  • Session Info
  • Pharmaverse Home
  1. ADaM
  2. ADAE

ADAE

Introduction

This article provides a step-by-step explanation for creating an ADaM ADAE (Adverse Events) dataset using key pharmaverse packages along with tidyverse components.

Here we try to cover the most common AE analysis dataset derivations and some of the most useful functions for these.

For the purpose of this example, we will use the ADSL dataset from {pharmaverseadam} and ae and ex domains from {pharmaversesdtm}.

Programming Flow

  • Load Data and Required pharmaverse Packages
  • Load Specifications for Metacore
  • Select ADSL Variables
  • Start Building Derivations
  • Derive Analysis Dates
  • Derive Duration
  • Derive Date of Last Dose
  • Derive Analysis Flags
  • Derive Occurrence Flags
  • Derive Query Variables
  • Add ADSL Variables
  • Apply Metadata to Create an eSub XPT and Perform Associated Checks

Load Data and Required pharmaverse Packages

The first step is to load the required packages and input data.

library(metacore)
library(metatools)
library(pharmaversesdtm)
library(pharmaverseadam)
library(admiral)
library(xportr)
library(dplyr)
library(lubridate)
library(stringr)
library(reactable)

# Read in input data
adsl <- pharmaverseadam::adsl
ae <- pharmaversesdtm::ae
ex <- pharmaversesdtm::ex

# When SAS datasets are imported into R using haven::read_sas(), missing
# character values from SAS appear as "" characters in R, instead of appearing
# as NA values. Further details can be obtained via the following link:
# https://pharmaverse.github.io/admiral/articles/admiral.html#handling-of-missing-values
ae <- convert_blanks_to_na(ae)
ex <- convert_blanks_to_na(ex)

Load Specifications for Metacore

We have saved our specifications in an Excel file and will load them into {metacore} with the metacore::spec_to_metacore() function.

When you run this in your R script you might find warnings that come from {metacore} which help you to improve your specifications. Examples could be specification columns with only missing values or unused codelists, but once checked these warnings could be suppressed using something like suppressWarnings().

# ---- Load Specs for Metacore ----
metacore <- spec_to_metacore(
  path = "./metadata/safety_specs.xlsx",
  # All datasets are described in the same sheet
  where_sep_sheet = FALSE
) %>%
  select_dataset("ADAE")

Select ADSL Variables

Some variables from the ADSL dataset required for the derivations are merged into the AE domain using the admiral::derive_vars_merged() function. Find other {admiral} functions and related variables by searching admiraldiscovery. The rest of the relevant ADSL variables will be added later.

# Select required ADSL variables
adsl_vars <- exprs(TRTSDT, TRTEDT, DTHDT)

# Join ADSL variables with VS
adae <- ae %>%
  derive_vars_merged(
    dataset_add = adsl,
    new_vars = adsl_vars,
    by_vars = exprs(STUDYID, USUBJID)
  )

Start Building Derivations

Derive Analysis Dates

The first derivation step we are going to do is to compute the Analysis Date and Relative Analysis Day with the variables merged from ADSL dataset using the admiral::derive_vars_dt() and admiral::derive_vars_dy() functions.

When deriving AE end date first as AENDT, we’re going to impute partial dates with missing day or month. For AE start date ASTDT then we’re going to use the same imputation, but additionally we ensure that the imputed date doesn’t go before treatment start date or after the AE end date.

For all of these example calls we don’t generally include the default arguments, so you should be sure to check out the full functional reference pages as then you’ll also learn what other arguments exist to allow further user control over the derivations.

# Derive ASTDT/ASTDTF/ASTDY and AENDT/AENDTF/AENDY
adae <- adae %>%
  derive_vars_dt(
    new_vars_prefix = "AEN",
    dtc = AEENDTC,
    date_imputation = "last",
    highest_imputation = "M", # imputation is performed on missing days or months
    flag_imputation = "auto" # to automatically create AENDTF variable
  ) %>%
  derive_vars_dt(
    new_vars_prefix = "AST",
    dtc = AESTDTC,
    highest_imputation = "M", # imputation is performed on missing days or months
    flag_imputation = "auto", # to automatically create ASTDTF variable
    min_dates = exprs(TRTSDT), # apply a minimum date for the imputation
    max_dates = exprs(AENDT) # apply a maximum date for the imputation
  ) %>%
  derive_vars_dy(
    reference_date = TRTSDT,
    source_vars = exprs(ASTDT, AENDT)
  )

Sample of Data

Derive Duration

Now we have these date variables we can derive AE duration using the admiral::derive_vars_duration() function. For this function the default is to + 1 day for the calculation, but this can be controlled with the add_one argument.

# Derive ADURN/ADURU
adae <- adae %>%
  derive_vars_duration(
    new_var = ADURN,
    new_var_unit = ADURU,
    start_date = ASTDT,
    end_date = AENDT
  )

Sample of Data

Derive Date of Last Dose

You might need to add some exposure information from ex such as deriving the date of last dose before each AE.

In the below case we call this LDOSEDT and we want to look only at exposure records with a valid dose or placebo. Here we use the admiral::derive_vars_joined() function which enables more complex joins.

# Derive LDOSEDT
# In our ex data the EXDOSFRQ (frequency) is "QD" which stands for once daily
# If this was not the case then we would need to use the admiral::create_single_dose_dataset() function
# to generate single doses from aggregate dose information
# Refer to https://pharmaverse.github.io/admiral/reference/create_single_dose_dataset.html
ex <- ex %>%
  derive_vars_dt(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN"
  )

adae <- adae %>%
  derive_vars_joined(
    dataset_add = ex,
    by_vars = exprs(STUDYID, USUBJID),
    order = exprs(EXENDT),
    new_vars = exprs(LDOSEDT = EXENDT),
    join_vars = exprs(EXENDT),
    join_type = "all",
    filter_add = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDT),
    filter_join = EXENDT <= ASTDT,
    mode = "last"
  )

Sample of Data

Sample of Data

Derive Analysis Flags

Next we looks at common analysis flags such as treatment emergent and on treatment flags using the admiral::derive_var_trtemfl() and admiral::derive_var_ontrtfl() functions. For the on-treatment flag we only want to include AEs occurring until 30 days after treatment end.

# Derive TRTEMFL and ONTRTFL
adae <- adae %>%
  derive_var_trtemfl(
    start_date = ASTDT,
    end_date = AENDT,
    trt_start_date = TRTSDT,
    trt_end_date = TRTEDT
  ) %>%
  derive_var_ontrtfl(
    start_date = ASTDT,
    ref_start_date = TRTSDT,
    ref_end_date = TRTEDT,
    ref_end_window = 30
  )

Sample of Data

At first these 2 functions may appear similar but both offer extra specific arguments for flexibility if you needed to apply more complex analysis rules to these derivations. For example, for treatment emergent you could use the intensity arguments if you also wanted to flag those AEs starting before treatment start and ending after treatment start with worsened intensity (i.e. the most extreme intensity is greater than the initial intensity).

Derive Occurrence Flags

There can be flags that need to be derived based on AE occurrences, such as flagging the first occurrence of maximum severity per patient which in the below example we call AOCCIFL. Here we use the admiral::derive_var_extreme_flag() function.

# Derive AOCCIFL
adae <- adae %>%
  # create temporary numeric ASEVN for sorting purpose
  mutate(TEMP_AESEVN = as.integer(factor(AESEV, levels = c("SEVERE", "MODERATE", "MILD")))) %>%
  derive_var_extreme_flag(
    new_var = AOCCIFL,
    by_vars = exprs(STUDYID, USUBJID),
    order = exprs(TEMP_AESEVN, ASTDT, AESEQ),
    mode = "first"
  )

Sample of Data

Derive Query Variables

The final set of derived flags would be around those checking for queries/baskets of AE terms, e.g. using Standardized MedDRA Queries (SMQs) or Custom Queries (CQs).

Before reading this section, you should read the Queries Dataset Vignette from {admiral}.

On a study you would need to create your own version of this queries dataset with the AE queries you need, but for the purpose of this article we use the example provided as part of {admiral} and we filter it to just one CQ and one SMQ. Then you can derive the required variables using the admiral::derive_vars_query() function.

queries <- admiral::queries %>%
  filter(PREFIX %in% c("CQ01", "SMQ02"))

Sample of Data

# Derive CQ01NAM and SMQ02NAM
adae <- adae %>%
  derive_vars_query(dataset_queries = queries)

Sample of Data

Add ADSL Variables

If needed, the other ADSL variables can now be added. List of ADSL variables already merged held in vector adsl_vars.

adae <- adae %>%
  derive_vars_merged(
    dataset_add = select(adsl, !!!negate_vars(adsl_vars)),
    by_vars = exprs(STUDYID, USUBJID)
  )

Apply Metadata to Create an eSub XPT and Perform Associated Checks

We use {metatools} and {xportr} to perform checks, apply metadata such as types, lengths, labels, and write the dataset to an XPT file.

dir <- tempdir() # Specify the directory for saving the XPT file

adae %>%
  drop_unspec_vars(metacore) %>% # Drop unspecified variables from specs
  check_variables(metacore) %>% # Check all variables specified are present and no more
  check_ct_data(metacore, na_acceptable = TRUE) %>% # Checks all variables with CT only contain values within the CT
  order_cols(metacore) %>% # Orders the columns according to the spec
  sort_by_key(metacore) %>% # Sorts the rows by the sort keys
  xportr_type(metacore, domain = "ADAE") %>% # Coerce variable type to match spec
  xportr_length(metacore) %>% # Assigns SAS length from a variable level metadata
  xportr_label(metacore) %>% # Assigns variable label from metacore specifications
  xportr_df_label(metacore) %>%  # Assigns dataset label from metacore specifications
  xportr_write(file.path(dir, "adae.xpt"), metadata = metacore, domain = "ADAE")
ADVS
Demographic Table
Source Code
---
title: "ADAE"
order: 7
---

```{r setup script, include=FALSE, purl=FALSE}
invisible_hook_purl <- function(before, options, ...) {
  knitr::hook_purl(before, options, ...)
  NULL
}
knitr::knit_hooks$set(purl = invisible_hook_purl)
source("functions/print_df.R")
```

# Introduction

This article provides a step-by-step explanation for creating an ADaM `ADAE` (Adverse Events) dataset using key pharmaverse packages along with tidyverse components.

Here we try to cover the most common AE analysis dataset derivations and some of the most useful functions for these. 

For the purpose of this example, we will use the `ADSL` dataset from `{pharmaverseadam}` and `ae` and `ex` domains from `{pharmaversesdtm}`.

## Programming Flow

* [Load Data and Required pharmaverse Packages](#loaddata)
* [Load Specifications for Metacore](#loadspecs)
* [Select ADSL Variables](#adslvars)
* [Start Building Derivations](#startbuild)
* [Derive Analysis Dates](#anldates)
* [Derive Duration](#anldur)
* [Derive Date of Last Dose](#exdates)
* [Derive Analysis Flags](#anlflags)
* [Derive Occurrence Flags](#occflags)
* [Derive Query Variables](#query)
* [Add ADSL Variables](#adsladd)
* [Apply Metadata to Create an eSub XPT and Perform Associated Checks](#metacore_xportr)

# Load Data and Required pharmaverse Packages {#loaddata}

The first step is to load the required packages and input data.

```{r setup, message=FALSE, warning=FALSE, results='hold'}
library(metacore)
library(metatools)
library(pharmaversesdtm)
library(pharmaverseadam)
library(admiral)
library(xportr)
library(dplyr)
library(lubridate)
library(stringr)
library(reactable)

# Read in input data
adsl <- pharmaverseadam::adsl
ae <- pharmaversesdtm::ae
ex <- pharmaversesdtm::ex

# When SAS datasets are imported into R using haven::read_sas(), missing
# character values from SAS appear as "" characters in R, instead of appearing
# as NA values. Further details can be obtained via the following link:
# https://pharmaverse.github.io/admiral/articles/admiral.html#handling-of-missing-values
ae <- convert_blanks_to_na(ae)
ex <- convert_blanks_to_na(ex)
```

# Load Specifications for Metacore {#loadspecs}

We have saved our specifications in an Excel file and will load them into `{metacore}` with the `metacore::spec_to_metacore()` function.

When you run this in your R script you might find warnings that come from `{metacore}` which help you to improve your specifications.
Examples could be specification columns with only missing values or unused codelists, but once checked these warnings could be suppressed using something like `suppressWarnings()`.

```{r echo=TRUE}
#| label: Load Specs
#| warning: false
# ---- Load Specs for Metacore ----
metacore <- spec_to_metacore(
  path = "./metadata/safety_specs.xlsx",
  # All datasets are described in the same sheet
  where_sep_sheet = FALSE
) %>%
  select_dataset("ADAE")
```

# Select ADSL Variables {#adslvars}

Some variables from the `ADSL` dataset required for the derivations are merged into the `AE` domain using the `admiral::derive_vars_merged()` function. Find other `{admiral}` functions and related variables by searching [admiraldiscovery](<https://pharmaverse.github.io/admiraldiscovery/articles/reactable.html>). The rest of the relevant `ADSL` variables will be added later.

```{r}
# Select required ADSL variables
adsl_vars <- exprs(TRTSDT, TRTEDT, DTHDT)

# Join ADSL variables with VS
adae <- ae %>%
  derive_vars_merged(
    dataset_add = adsl,
    new_vars = adsl_vars,
    by_vars = exprs(STUDYID, USUBJID)
  )
```

# Start Building Derivations {#startbuild}

## Derive Analysis Dates {#anldates}

The first derivation step we are going to do is to compute the Analysis Date and Relative Analysis Day with the variables merged from `ADSL` dataset using the `admiral::derive_vars_dt()` and `admiral::derive_vars_dy()` functions.

When deriving AE end date first as `AENDT`, we're going to impute partial dates with missing day or month.
For AE start date `ASTDT` then we're going to use the same imputation, but additionally we ensure that the imputed date doesn't go before treatment start date or after the AE end date.

For all of these example calls we don't generally include the default arguments, so you should be sure to check out the full functional reference pages as then you'll also learn what other arguments exist to allow further user control over the derivations.

```{r}
# Derive ASTDT/ASTDTF/ASTDY and AENDT/AENDTF/AENDY
adae <- adae %>%
  derive_vars_dt(
    new_vars_prefix = "AEN",
    dtc = AEENDTC,
    date_imputation = "last",
    highest_imputation = "M", # imputation is performed on missing days or months
    flag_imputation = "auto" # to automatically create AENDTF variable
  ) %>%
  derive_vars_dt(
    new_vars_prefix = "AST",
    dtc = AESTDTC,
    highest_imputation = "M", # imputation is performed on missing days or months
    flag_imputation = "auto", # to automatically create ASTDTF variable
    min_dates = exprs(TRTSDT), # apply a minimum date for the imputation
    max_dates = exprs(AENDT) # apply a maximum date for the imputation
  ) %>%
  derive_vars_dy(
    reference_date = TRTSDT,
    source_vars = exprs(ASTDT, AENDT)
  )
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(adae %>% select(USUBJID, AESTDTC, ASTDT, ASTDTF, ASTDY, AEENDTC, AENDT, AENDTF, AENDY))
```

## Derive Duration {#anldur}

Now we have these date variables we can derive AE duration using the `admiral::derive_vars_duration()` function.
For this function the default is to + 1 day for the calculation, but this can be controlled with the `add_one` argument.

```{r}
# Derive ADURN/ADURU
adae <- adae %>%
  derive_vars_duration(
    new_var = ADURN,
    new_var_unit = ADURU,
    start_date = ASTDT,
    end_date = AENDT
  )
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(adae %>% select(USUBJID, ASTDT, AENDT, ADURN, ADURU))
```

## Derive Date of Last Dose {#exdates}

You might need to add some exposure information from `ex` such as deriving the date of last dose before each AE.

In the below case we call this `LDOSEDT` and we want to look only at exposure records with a valid dose or placebo.
Here we use the `admiral::derive_vars_joined()` function which enables more complex joins.

```{r}
# Derive LDOSEDT
# In our ex data the EXDOSFRQ (frequency) is "QD" which stands for once daily
# If this was not the case then we would need to use the admiral::create_single_dose_dataset() function
# to generate single doses from aggregate dose information
# Refer to https://pharmaverse.github.io/admiral/reference/create_single_dose_dataset.html
ex <- ex %>%
  derive_vars_dt(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN"
  )

adae <- adae %>%
  derive_vars_joined(
    dataset_add = ex,
    by_vars = exprs(STUDYID, USUBJID),
    order = exprs(EXENDT),
    new_vars = exprs(LDOSEDT = EXENDT),
    join_vars = exprs(EXENDT),
    join_type = "all",
    filter_add = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDT),
    filter_join = EXENDT <= ASTDT,
    mode = "last"
  )
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(ex %>% select(USUBJID, EXTRT, EXDOSE, EXENDT))
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(adae %>% select(USUBJID, ASTDT, LDOSEDT))
```

## Derive Analysis Flags {#anlflags}

Next we looks at common analysis flags such as treatment emergent and on treatment flags using the `admiral::derive_var_trtemfl()` and `admiral::derive_var_ontrtfl()` functions.
For the on-treatment flag we only want to include AEs occurring until 30 days after treatment end.

```{r}
# Derive TRTEMFL and ONTRTFL
adae <- adae %>%
  derive_var_trtemfl(
    start_date = ASTDT,
    end_date = AENDT,
    trt_start_date = TRTSDT,
    trt_end_date = TRTEDT
  ) %>%
  derive_var_ontrtfl(
    start_date = ASTDT,
    ref_start_date = TRTSDT,
    ref_end_date = TRTEDT,
    ref_end_window = 30
  )
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(adae %>% select(USUBJID, ASTDT, TRTSDT, TRTEDT, TRTEMFL, ONTRTFL))
```

At first these 2 functions may appear similar but both offer extra specific arguments for
flexibility if you needed to apply more complex analysis rules to these derivations. For example, for
treatment emergent you could use the intensity arguments if you also wanted to flag those AEs
starting before treatment start and ending after treatment start with worsened intensity
(i.e. the most extreme intensity is greater than the initial intensity).

## Derive Occurrence Flags {#occflags}

There can be flags that need to be derived based on AE occurrences, such as flagging the first occurrence of maximum severity per patient which in the below example we call `AOCCIFL`.
Here we use the `admiral::derive_var_extreme_flag()` function.

```{r}
# Derive AOCCIFL
adae <- adae %>%
  # create temporary numeric ASEVN for sorting purpose
  mutate(TEMP_AESEVN = as.integer(factor(AESEV, levels = c("SEVERE", "MODERATE", "MILD")))) %>%
  derive_var_extreme_flag(
    new_var = AOCCIFL,
    by_vars = exprs(STUDYID, USUBJID),
    order = exprs(TEMP_AESEVN, ASTDT, AESEQ),
    mode = "first"
  )
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(adae %>% select(USUBJID, ASTDT, AESEQ, AESEV, AOCCIFL))
```

## Derive Query Variables {#query}

The final set of derived flags would be around those checking for queries/baskets of AE terms, e.g. using Standardized MedDRA Queries (SMQs) or Custom Queries (CQs). 

Before reading this section, you should read the [Queries Dataset Vignette](https://pharmaverse.github.io/admiral/articles/queries_dataset.html) from `{admiral}`.

On a study you would need to create your own version of this queries dataset with the AE queries you need, but for the purpose of this article we use the example provided as part of `{admiral}` and we filter it to just one CQ and one SMQ.
Then you can derive the required variables using the `admiral::derive_vars_query()` function.

```{r}
queries <- admiral::queries %>%
  filter(PREFIX %in% c("CQ01", "SMQ02"))
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(queries)
```

```{r}
# Derive CQ01NAM and SMQ02NAM
adae <- adae %>%
  derive_vars_query(dataset_queries = queries)
```

```{r eval=TRUE, echo=FALSE, purl=FALSE}
print_df(adae %>% select(USUBJID, AEDECOD, CQ01NAM, SMQ02NAM))
```

## Add ADSL Variables {#adsladd}

If needed, the other `ADSL` variables can now be added.
List of ADSL variables already merged held in vector `adsl_vars`.

```{r eval=TRUE}
adae <- adae %>%
  derive_vars_merged(
    dataset_add = select(adsl, !!!negate_vars(adsl_vars)),
    by_vars = exprs(STUDYID, USUBJID)
  )
```

# Apply Metadata to Create an eSub XPT and Perform Associated Checks {#metacore_xportr}

We use `{metatools}` and `{xportr}` to perform checks, apply metadata such as types, lengths, labels, and write the dataset to an XPT file.

```{r checks, warning=FALSE, message=FALSE}
dir <- tempdir() # Specify the directory for saving the XPT file

adae %>%
  drop_unspec_vars(metacore) %>% # Drop unspecified variables from specs
  check_variables(metacore) %>% # Check all variables specified are present and no more
  check_ct_data(metacore, na_acceptable = TRUE) %>% # Checks all variables with CT only contain values within the CT
  order_cols(metacore) %>% # Orders the columns according to the spec
  sort_by_key(metacore) %>% # Sorts the rows by the sort keys
  xportr_type(metacore, domain = "ADAE") %>% # Coerce variable type to match spec
  xportr_length(metacore) %>% # Assigns SAS length from a variable level metadata
  xportr_label(metacore) %>% # Assigns variable label from metacore specifications
  xportr_df_label(metacore) %>%  # Assigns dataset label from metacore specifications
  xportr_write(file.path(dir, "adae.xpt"), metadata = metacore, domain = "ADAE")
```
 
Cookie Preferences