pharmaverse examples
  1. TLG
  2. Demographic Table
  • Introduction

  • SDTM
    • DM
    • VS
    • AE
  • ADaM
    • ADSL
    • ADPC
    • ADPPK
    • ADRS
    • ADTTE
    • ADVS
    • ADAE
  • TLG
    • Demographic Table
    • Adverse Events
    • Pharmacokinetic
  • Documents
    • Slides
    • Documents
  • Interactive
    • teal applications
  • Logs
    • The Difference Between logr, logrx, and whirl
  • eSubmission
    • eSubmission

  • Session Info
  • Pharmaverse Home
  1. TLG
  2. Demographic Table

Demographic Table

Introduction

This guide will show you how pharmaverse packages, along with some from tidyverse, can be used to create a Demographic table, using the {pharmaverseadam} ADSL data as an input.

In the examples below, we illustrate three general approaches for creating a demographics table. The first and third utilize Analysis Results Datasets—part of the emerging CDISC Analysis Results Standard. The second is the classic method of creating summary tables directly from a data set. ## Multiple Approaches Shown

This example includes:

  • ARD-based approaches: {cards} + {gtsummary} or {tfrmt} - Creates structured Analysis Results Datasets following CDISC standards, then formats them into tables
  • Classic approach: {rtables} + {tern} - Direct table creation from datasets using mature, stable packages

Data preprocessing

Now we will add some pre-processing to create some extra formatted variables ready for display in the table.

library(dplyr)

# Create categorical variables, remove screen failures, and assign column labels
adsl <- pharmaverseadam::adsl |>
  filter(!ACTARM %in% "Screen Failure") |>
  mutate(
    SEX = case_match(SEX, "M" ~ "MALE", "F" ~ "FEMALE"),
    AGEGR1 =
      case_when(
        between(AGE, 18, 40) ~ "18-40",
        between(AGE, 41, 64) ~ "41-64",
        AGE > 64 ~ ">=65"
      ) |>
        factor(levels = c("18-40", "41-64", ">=65"))
  ) |>
  labelled::set_variable_labels(
    AGE = "Age (yr)",
    AGEGR1 = "Age group",
    SEX = "Sex",
    RACE = "Race"
  )

{gtsummary} & {cards}

In the example below, we will use the {gtsummary} and {cards} packages to create a demographics tables.

  • The {cards} package creates Analysis Results Datasets (ARDs, which are a part of the CDISC Analysis Results Standard).
  • The {gtsummary} utilizes ARDs to create tables.

ARD ➡ Table

In the example below, we first build an ARD with the needed summary statistics using {cards}. Then, we use the ARD to build the demographics table with {gtsummary}.

library(cards)
library(gtsummary)
theme_gtsummary_compact() # reduce default padding and font size for a gt table

# build the ARD with the needed summary statistics using {cards}
ard <-
  ard_stack(
    adsl,
    ard_continuous(variables = AGE),
    ard_categorical(variables = c(AGEGR1, SEX, RACE)),
    .by = ACTARM, # split results by treatment arm
    .attributes = TRUE # optionally include column labels in the ARD
  )

# use the ARD to create a demographics table using {gtsummary}
tbl_ard_summary(
  cards = ard,
  by = ACTARM,
  include = c(AGE, AGEGR1, SEX, RACE),
  type = AGE ~ "continuous2",
  statistic = AGE ~ c("{N}", "{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}")
) |>
  bold_labels() |>
  modify_header(all_stat_cols() ~ "**{level}**  \nN = {n}") |> # add Ns to header
  modify_footnote(everything() ~ NA) # remove default footnote
Characteristic Placebo
N = 86
Xanomeline High Dose
N = 72
Xanomeline Low Dose
N = 96
Age (yr)


    N 86 72 96
    Mean (SD) 75.2 (8.6) 73.8 (7.9) 76.0 (8.1)
    Median (Q1, Q3) 76.0 (69.0, 82.0) 75.5 (70.0, 79.0) 78.0 (71.0, 82.0)
    Min, Max 52.0, 89.0 56.0, 88.0 51.0, 88.0
Age group


    18-40 0 (0.0%) 0 (0.0%) 0 (0.0%)
    41-64 14 (16.3%) 11 (15.3%) 8 (8.3%)
    >=65 72 (83.7%) 61 (84.7%) 88 (91.7%)
Sex


    FEMALE 53 (61.6%) 35 (48.6%) 55 (57.3%)
    MALE 33 (38.4%) 37 (51.4%) 41 (42.7%)
Race


    AMERICAN INDIAN OR ALASKA NATIVE 0 (0.0%) 1 (1.4%) 0 (0.0%)
    BLACK OR AFRICAN AMERICAN 8 (9.3%) 9 (12.5%) 6 (6.3%)
    WHITE 78 (90.7%) 62 (86.1%) 90 (93.8%)

Table ➡ ARD

One may also build the demographics in the classic way using gtsummary::tbl_summary() from a data frame, then extract the ARD from the table object.

# build demographics table directly from a data frame
tbl <- adsl |> tbl_summary(by = ACTARM, include = c(AGE, AGEGR1, SEX, RACE))

# extract ARD from table object
gather_ard(tbl)[[1]] |> select(-gts_column) # removing column so ARD fits on page
{cards} data frame: 162 x 11
   group1 group1_level variable variable_level stat_name stat_label  stat
1  ACTARM      Placebo      SEX         FEMALE         n          n    53
2  ACTARM      Placebo      SEX         FEMALE         N          N    86
3  ACTARM      Placebo      SEX         FEMALE         p          % 0.616
4  ACTARM      Placebo      SEX           MALE         n          n    33
5  ACTARM      Placebo      SEX           MALE         N          N    86
6  ACTARM      Placebo      SEX           MALE         p          % 0.384
7  ACTARM      Placebo     RACE      AMERICAN…         n          n     0
8  ACTARM      Placebo     RACE      AMERICAN…         N          N    86
9  ACTARM      Placebo     RACE      AMERICAN…         p          %     0
10 ACTARM      Placebo     RACE      BLACK OR…         n          n     8
ℹ 152 more rows
ℹ Use `print(n = ...)` to see more rows
ℹ 4 more variables: context, fmt_fun, warning, error

{rtables} & {tern}

The packages used with a brief description of their purpose are as follows:

  • {rtables}: designed to create and display complex tables with R.
  • {tern}: contains analysis functions to create tables and graphs used for clinical trial reporting.

After installation of packages, the first step is to load our pharmaverse packages and input data. Here, we are going to encode missing entries in a data frame adsl.

Note that {tern} depends on {rtables} so the latter is automatically attached.

library(tern)

adsl2 <- adsl |>
  df_explicit_na()

Now we create the demographic table.

vars <- c("AGE", "AGEGR1", "SEX", "RACE")
var_labels <- c(
  "Age (yr)",
  "Age group",
  "Sex",
  "Race"
)

lyt <- basic_table(show_colcounts = TRUE) |>
  split_cols_by(var = "ACTARM") |>
  add_overall_col("All Patients") |>
  analyze_vars(
    vars = vars,
    var_labels = var_labels
  )

result <- build_table(lyt, adsl2)

result
                                       Placebo     Xanomeline High Dose   Xanomeline Low Dose   All Patients
                                       (N=86)             (N=72)                (N=96)            (N=254)   
————————————————————————————————————————————————————————————————————————————————————————————————————————————
Age (yr)                                                                                                    
  n                                      86                 72                    96                254     
  Mean (SD)                          75.2 (8.6)         73.8 (7.9)            76.0 (8.1)         75.1 (8.2) 
  Median                                76.0               75.5                  78.0               77.0    
  Min - Max                          52.0 - 89.0       56.0 - 88.0            51.0 - 88.0       51.0 - 89.0 
Age group                                                                                                   
  n                                      86                 72                    96                254     
  18-40                                   0                 0                      0                 0      
  41-64                              14 (16.3%)         11 (15.3%)             8 (8.3%)           33 (13%)  
  >=65                               72 (83.7%)         61 (84.7%)            88 (91.7%)         221 (87%)  
Sex                                                                                                         
  n                                      86                 72                    96                254     
  FEMALE                             53 (61.6%)         35 (48.6%)            55 (57.3%)        143 (56.3%) 
  MALE                               33 (38.4%)         37 (51.4%)            41 (42.7%)        111 (43.7%) 
Race                                                                                                        
  n                                      86                 72                    96                254     
  AMERICAN INDIAN OR ALASKA NATIVE        0              1 (1.4%)                  0              1 (0.4%)  
  BLACK OR AFRICAN AMERICAN           8 (9.3%)          9 (12.5%)              6 (6.2%)          23 (9.1%)  
  WHITE                              78 (90.7%)         62 (86.1%)            90 (93.8%)        230 (90.6%) 

{tfrmt} & {cards}

In the example below, we will use the {tfrmt} and {cards} packages to create a demographics tables.

  • The {cards} package creates Analysis Results Datasets (ARDs, which are a part of the CDISC Analysis Results Standard).
  • The {tfrmt} utilizes ARDs to create tables.

In the example below, we first build an ARD with the needed summary statistics using {cards}. Then, we use the ARD to build the demographics table with {tfrmt}.

library(cards)
library(forcats)
library(tfrmt)

# build the ARD with the needed summary statistics using {cards}
ard <-
  ard_stack(
    adsl,
    ard_continuous(
      variables = AGE,
      statistic = ~ continuous_summary_fns(c("N", "mean", "sd", "min", "max"))
    ),
    ard_categorical(variables = c(AGEGR1, SEX, RACE)),
    .by = ACTARM, # split results by treatment arm
    .overall = TRUE,
    .total_n = TRUE
  )

# tidy the ARD for use in {tfrmt}
ard_tbl <-
  ard |>
  # reshape the data
  shuffle_card(fill_overall = "Total") |>
  # transform group-level freqs/pcts into a singular "bigN" row
  prep_big_n(vars = "ACTARM") |>
  # consolidate vars into a single variable column
  prep_combine_vars(vars = c("AGE", "AGEGR1", "SEX", "RACE")) |>
  # coalesce categorical levels + continuous stats into a "label"
  prep_label() |>
  group_by(ACTARM, stat_variable) |>
  mutate(across(c(variable_level, label), ~ ifelse(stat_name == "N", "n", .x))) |>
  ungroup() |>
  unique() |>
  # sorting
  mutate(
    ord1 = fct_inorder(stat_variable) |> fct_relevel("SEX", after = 0) |> as.numeric(),
    ord2 = ifelse(label == "n", 1, 2)
  ) |>
  # relabel the variables
  mutate(stat_variable = case_when(
    stat_variable == "AGE" ~ "Age (YEARS) at First Dose",
    stat_variable == "AGEGR1" ~ "Age Group (YEARS) at First Dose",
    stat_variable == "SEX" ~ "Sex",
    stat_variable == "RACE" ~ "High Level Race",
    .default = stat_variable
  )) |>
  # drop variables not needed
  select(ACTARM, stat_variable, label, stat_name, stat, ord1, ord2) |>
  # remove duplicates (extra denominators per variable level)
  unique()

# create a demographics table using {tfrmt}
DM_T01 <- tfrmt(
  group = stat_variable,
  label = label,
  param = stat_name,
  value = stat,
  column = ACTARM,
  sorting_cols = c(ord1, ord2),
  body_plan = body_plan(
    frmt_structure(group_val = ".default", label_val = ".default", frmt("xxx")),
    frmt_structure(
      group_val = ".default", label_val = ".default",
      frmt_combine("{n} ({p}%)",
        n = frmt("xxx"),
        p = frmt("xx", transform = ~ . * 100)
      )
    )
  ),
  big_n = big_n_structure(param_val = "bigN", n_frmt = frmt(" (N=xx)")),
  col_plan = col_plan(
    -starts_with("ord")
  ),
  col_style_plan = col_style_plan(
    col_style_structure(col = c("Placebo", "Xanomeline High Dose", "Xanomeline Low Dose", "Total"), align = "left")
  ),
  row_grp_plan = row_grp_plan(
    row_grp_structure(group_val = ".default", element_block(post_space = " "))
  )
) |>
  print_to_gt(ard_tbl)

DM_T01
Placebo (N=86) Xanomeline High Dose (N=72) Xanomeline Low Dose (N=96) Total (N=254)
Sex



  n  86        72        96       254      
  FEMALE  53 (62%)  35 (49%)  55 (57%) 143 (56%)
  MALE  33 (38%)  37 (51%)  41 (43%) 111 (44%)
                                           
Age (YEARS) at First Dose



  n  86        72        96       254      
  Mean  75        74        76        75      
  SD   9         8         8         8      
  Min  52        56        51        51      
  Max  89        88        88        89      
                                           
Age Group (YEARS) at First Dose



  n  86        72        96       254      
  18-40   0 ( 0%)   0 ( 0%)   0 ( 0%)   0 ( 0%)
  41-64  14 (16%)  11 (15%)   8 ( 8%)  33 (13%)
  >=65  72 (84%)  61 (85%)  88 (92%) 221 (87%)
                                           
High Level Race



  n  86        72        96       254      
  AMERICAN INDIAN OR ALASKA NATIVE   0 ( 0%)   1 ( 1%)   0 ( 0%)   1 ( 0%)
  BLACK OR AFRICAN AMERICAN   8 ( 9%)   9 (12%)   6 ( 6%)  23 ( 9%)
  WHITE  78 (91%)  62 (86%)  90 (94%) 230 (91%)
                                           
ADAE
Adverse Events
Source Code
---
title: "Demographic Table"
order: 1
---

```{r setup script, include=FALSE, purl=FALSE}
invisible_hook_purl <- function(before, options, ...) {
  knitr::hook_purl(before, options, ...)
  NULL
}
knitr::knit_hooks$set(purl = invisible_hook_purl)
knitr::opts_chunk$set(echo = TRUE)
```

## Introduction

This guide will show you how pharmaverse packages, along with some from tidyverse, can be used to create a Demographic table, using the `{pharmaverseadam}` `ADSL` data as an input.

In the examples below, we illustrate three general approaches for creating a demographics table.
The first and third utilize Analysis Results Datasets---part of the emerging [CDISC Analysis Results Standard](https://www.cdisc.org/standards/foundational/analysis-results-standard).
The second is the classic method of creating summary tables directly from a data set.
## Multiple Approaches Shown

This example includes:

-   **ARD-based approaches**: `{cards}` + `{gtsummary}` or `{tfrmt}` - Creates structured Analysis Results Datasets following CDISC standards, then formats them into tables
-   **Classic approach**: `{rtables}` + `{tern}` - Direct table creation from datasets using mature, stable packages

## Data preprocessing

Now we will add some pre-processing to create some extra formatted variables ready for display in the table.

```{r preproc}
#| message: false
library(dplyr)

# Create categorical variables, remove screen failures, and assign column labels
adsl <- pharmaverseadam::adsl |>
  filter(!ACTARM %in% "Screen Failure") |>
  mutate(
    SEX = case_match(SEX, "M" ~ "MALE", "F" ~ "FEMALE"),
    AGEGR1 =
      case_when(
        between(AGE, 18, 40) ~ "18-40",
        between(AGE, 41, 64) ~ "41-64",
        AGE > 64 ~ ">=65"
      ) |>
        factor(levels = c("18-40", "41-64", ">=65"))
  ) |>
  labelled::set_variable_labels(
    AGE = "Age (yr)",
    AGEGR1 = "Age group",
    SEX = "Sex",
    RACE = "Race"
  )
```

## {gtsummary} & {cards}

In the example below, we will use the [{gtsummary}](https://www.danieldsjoberg.com/gtsummary/) and [{cards}](https://insightsengineering.github.io/cards/) packages to create a demographics tables.

- The {cards} package creates Analysis Results Datasets (ARDs, which are a part of the [CDISC Analysis Results Standard](https://www.cdisc.org/standards/foundational/analysis-results-standard)).
- The {gtsummary} utilizes ARDs to create tables.

#### ARD ➡ Table

In the example below, we first build an ARD with the needed summary statistics using {cards}.
Then, we use the ARD to build the demographics table with {gtsummary}.

```{r gtsummary-table}
#| message: false
library(cards)
library(gtsummary)
theme_gtsummary_compact() # reduce default padding and font size for a gt table

# build the ARD with the needed summary statistics using {cards}
ard <-
  ard_stack(
    adsl,
    ard_continuous(variables = AGE),
    ard_categorical(variables = c(AGEGR1, SEX, RACE)),
    .by = ACTARM, # split results by treatment arm
    .attributes = TRUE # optionally include column labels in the ARD
  )

# use the ARD to create a demographics table using {gtsummary}
tbl_ard_summary(
  cards = ard,
  by = ACTARM,
  include = c(AGE, AGEGR1, SEX, RACE),
  type = AGE ~ "continuous2",
  statistic = AGE ~ c("{N}", "{mean} ({sd})", "{median} ({p25}, {p75})", "{min}, {max}")
) |>
  bold_labels() |>
  modify_header(all_stat_cols() ~ "**{level}**  \nN = {n}") |> # add Ns to header
  modify_footnote(everything() ~ NA) # remove default footnote
```

#### Table ➡ ARD

One may also build the demographics in the classic way using `gtsummary::tbl_summary()` from a data frame, then extract the ARD from the table object.

```{r gtsummary-ard}
# build demographics table directly from a data frame
tbl <- adsl |> tbl_summary(by = ACTARM, include = c(AGE, AGEGR1, SEX, RACE))

# extract ARD from table object
gather_ard(tbl)[[1]] |> select(-gts_column) # removing column so ARD fits on page
```

## {rtables} & {tern}

The packages used with a brief description of their purpose are as follows:

* [`{rtables}`](https://insightsengineering.github.io/rtables/): designed to create and display complex tables with R.
* [`{tern}`](https://insightsengineering.github.io/tern/): contains analysis functions to create tables and graphs used for clinical trial reporting.

After installation of packages, the first step is to load our pharmaverse packages and input data. Here, we are going to encode missing entries in a data frame `adsl`.

Note that `{tern}` depends on `{rtables}` so the latter is automatically attached.

```{r rtables-setup, message=FALSE, warning=FALSE, results='hold'}
library(tern)

adsl2 <- adsl |>
  df_explicit_na()
```

Now we create the demographic table.

```{r rtables-table}
vars <- c("AGE", "AGEGR1", "SEX", "RACE")
var_labels <- c(
  "Age (yr)",
  "Age group",
  "Sex",
  "Race"
)

lyt <- basic_table(show_colcounts = TRUE) |>
  split_cols_by(var = "ACTARM") |>
  add_overall_col("All Patients") |>
  analyze_vars(
    vars = vars,
    var_labels = var_labels
  )

result <- build_table(lyt, adsl2)

result
```

## {tfrmt} & {cards}

In the example below, we will use the [{tfrmt}](https://github.com/GSK-Biostatistics/tfrmt) and [{cards}](https://insightsengineering.github.io/cards/) packages to create a demographics tables.

- The {cards} package creates Analysis Results Datasets (ARDs, which are a part of the [CDISC Analysis Results Standard](https://www.cdisc.org/standards/foundational/analysis-results-standard)).
- The {tfrmt} utilizes ARDs to create tables.

In the example below, we first build an ARD with the needed summary statistics using {cards}.
Then, we use the ARD to build the demographics table with {tfrmt}.

```{r tfrmt-table}
#| message: false
library(cards)
library(forcats)
library(tfrmt)

# build the ARD with the needed summary statistics using {cards}
ard <-
  ard_stack(
    adsl,
    ard_continuous(
      variables = AGE,
      statistic = ~ continuous_summary_fns(c("N", "mean", "sd", "min", "max"))
    ),
    ard_categorical(variables = c(AGEGR1, SEX, RACE)),
    .by = ACTARM, # split results by treatment arm
    .overall = TRUE,
    .total_n = TRUE
  )

# tidy the ARD for use in {tfrmt}
ard_tbl <-
  ard |>
  # reshape the data
  shuffle_card(fill_overall = "Total") |>
  # transform group-level freqs/pcts into a singular "bigN" row
  prep_big_n(vars = "ACTARM") |>
  # consolidate vars into a single variable column
  prep_combine_vars(vars = c("AGE", "AGEGR1", "SEX", "RACE")) |>
  # coalesce categorical levels + continuous stats into a "label"
  prep_label() |>
  group_by(ACTARM, stat_variable) |>
  mutate(across(c(variable_level, label), ~ ifelse(stat_name == "N", "n", .x))) |>
  ungroup() |>
  unique() |>
  # sorting
  mutate(
    ord1 = fct_inorder(stat_variable) |> fct_relevel("SEX", after = 0) |> as.numeric(),
    ord2 = ifelse(label == "n", 1, 2)
  ) |>
  # relabel the variables
  mutate(stat_variable = case_when(
    stat_variable == "AGE" ~ "Age (YEARS) at First Dose",
    stat_variable == "AGEGR1" ~ "Age Group (YEARS) at First Dose",
    stat_variable == "SEX" ~ "Sex",
    stat_variable == "RACE" ~ "High Level Race",
    .default = stat_variable
  )) |>
  # drop variables not needed
  select(ACTARM, stat_variable, label, stat_name, stat, ord1, ord2) |>
  # remove duplicates (extra denominators per variable level)
  unique()

# create a demographics table using {tfrmt}
DM_T01 <- tfrmt(
  group = stat_variable,
  label = label,
  param = stat_name,
  value = stat,
  column = ACTARM,
  sorting_cols = c(ord1, ord2),
  body_plan = body_plan(
    frmt_structure(group_val = ".default", label_val = ".default", frmt("xxx")),
    frmt_structure(
      group_val = ".default", label_val = ".default",
      frmt_combine("{n} ({p}%)",
        n = frmt("xxx"),
        p = frmt("xx", transform = ~ . * 100)
      )
    )
  ),
  big_n = big_n_structure(param_val = "bigN", n_frmt = frmt(" (N=xx)")),
  col_plan = col_plan(
    -starts_with("ord")
  ),
  col_style_plan = col_style_plan(
    col_style_structure(col = c("Placebo", "Xanomeline High Dose", "Xanomeline Low Dose", "Total"), align = "left")
  ),
  row_grp_plan = row_grp_plan(
    row_grp_structure(group_val = ".default", element_block(post_space = " "))
  )
) |>
  print_to_gt(ard_tbl)

DM_T01
```
 
Cookie Preferences