pharmaverse examples
  1. TLG
  2. Oncology Survival
  • Introduction

  • SDTM
    • DM
    • VS
    • AE
  • ADaM
    • ADSL
    • ADVS
    • ADAE
    • ADRS
    • ADTTE
    • ADPC
    • ADPPK
    • ADER+
  • TLG
    • Demographic Table
    • Adverse Events
    • Oncology Survival
    • Pharmacokinetic
  • Documents
    • Slides
    • Documents
  • Interactive
    • teal applications
  • Logs
    • The Difference Between logr, logrx, and whirl
  • eSubmission
    • eSubmission

  • Session Info
  • Pharmaverse Home
  1. TLG
  2. Oncology Survival

Oncology Survival

Introduction

This guide demonstrates how pharmaverse packages, along with tools from the tidyverse, can be used to create standard oncology efficacy Tables, Listings, and Graphs (TLGs) using the {pharmaverseadam} ADTTE_ONCO and ADRS_ONCO datasets as input.

The packages used, with a brief description of their purpose, are as follows:

  • {pharmaverseadam}: provides CDISC ADaM example datasets for use in pharmaverse examples and tests.
  • {ggsurvfit}: eases the creation of publication-ready time-to-event (survival) figures built on {ggplot2}. Includes Surv_CNSR() to handle CDISC ADTTE censoring conventions natively.
  • {gtsummary}: creates publication-ready summary and analytical tables. Used here for survival and tumor response summaries.
  • {dplyr}: provides data manipulation functions used to prepare and filter the ADaM data.
  • {forcats}: provides factor manipulation utilities used to order RECIST response categories for table display.
  • {broom}: extracts tidy model output from Cox regression fits — used to build the subgroup forest plot.
  • {survival}: provides coxph() for fitting Cox proportional hazards models.

The outputs produced in this example are:

  1. Kaplan-Meier plot with confidence intervals, risk table, and median guideline — a standard figure for oncology efficacy reporting.
  2. Risk table displayed beneath the KM plot showing number at risk, events, and censored counts at each time point.
  3. Median survival table with 95% confidence interval, suitable for inclusion in a clinical study report (CSR).
  4. Survival probability table at selected time points.
  5. Best Overall Response table with category counts, percentages, and inline ORR (CR + PR) summary from ADRS_ONCO.
  6. Stratified KM plot using the built-in ggsurvfit::adtte four-arm trial dataset to illustrate multi-arm analyses.
  7. Subgroup forest plot showing PFS hazard ratios by hormone receptor status and prior radiotherapy history using ggsurvfit::adtte.

Setup

We load the required packages and read ADTTE_ONCO and ADRS_ONCO from {pharmaverseadam}. ADTTE_ONCO is the oncology-specific TTE dataset generated from the {admiralonco} template — it contains three endpoints (OS, PFS, RSD) with treatment arm variables already merged in. ADRS_ONCO is the tumor response dataset containing per-visit and summary response parameters (BOR, CBOR, RSP) derived from RECIST 1.1 assessments.

Because the CDISC ADTTE censoring variable CNSR is coded 1 = censored / 0 = event (the reverse of base R’s survival::Surv() convention), we use Surv_CNSR() from {ggsurvfit} throughout to avoid mis-coding errors.

library(pharmaverseadam)
library(ggsurvfit)
library(ggplot2)
library(gtsummary)
library(dplyr)
library(forcats)
library(broom)
library(survival)

# ── Read data ──────────────────────────────────────────────────────────────────
adtte_onco <- pharmaverseadam::adtte_onco

# ── ADRS_ONCO: tumor response data ───────────────────────────────────────────
adrs_onco <- pharmaverseadam::adrs_onco |>
  filter(ARMCD != "Scrnfail")

# Overview of available endpoints and their event rates
adtte_onco |>
  group_by(PARAMCD, PARAM) |>
  summarise(
    N           = n(),
    N_events    = sum(CNSR == 0),
    Pct_events  = round(100 * mean(CNSR == 0), 1),
    Median_AVAL = round(median(AVAL), 2),
    .groups     = "drop"
  )
# A tibble: 3 × 6
  PARAMCD PARAM                         N N_events Pct_events Median_AVAL
  <chr>   <chr>                     <int>    <int>      <dbl>       <dbl>
1 OS      Overall Survival            254        3        1.2       140  
2 PFS     Progression Free Survival   254        6        2.4         1  
3 RSD     Duration of Response          4        1       25          32.5

Why PFS for this example? OS in early-phase oncology trials typically has few events (most subjects are still alive), producing a near-flat KM curve with a median that cannot be estimated — not ideal for illustration. PFS has more events and a shorter follow-up window, giving a more informative curve. The code below uses PARAMCD == "PFS"; swap to "OS" or "RSD" as needed. Note that {admiralonco} uses "RSD" (not "DOR") for Duration of Response.

Note

pharmaverseadam::adtte_onco is a small example dataset with limited events across all endpoints, so the resulting KM curves and tables are for illustration only and should not be interpreted clinically. For a richer example with more events and better-separated arms, see the Stratified KM Plot section below, which uses the ggsurvfit::adtte four-arm breast cancer trial dataset.

# ── PFS endpoint ────────────────────────────────────
adtte_pfs <- adtte_onco |>
  filter(PARAMCD == "PFS")

# Preview key variables
adtte_pfs |>
  select(USUBJID, PARAM, PARAMCD, AVAL, CNSR, EVNTDESC, CNSDTDSC) |>
  head(5)
# A tibble: 5 × 7
  USUBJID     PARAM                     PARAMCD  AVAL  CNSR EVNTDESC    CNSDTDSC
  <chr>       <chr>                     <chr>   <dbl> <int> <chr>       <chr>   
1 01-701-1015 Progression Free Survival PFS        64     1 Last Tumor… Last Tu…
2 01-701-1023 Progression Free Survival PFS         1     1 Randomizat… Randomi…
3 01-701-1028 Progression Free Survival PFS        43     0 Disease Pr… <NA>    
4 01-701-1033 Progression Free Survival PFS         1     1 Randomizat… Randomi…
5 01-701-1034 Progression Free Survival PFS        43     1 Last Tumor… Last Tu…

Note on CNSR coding: In CDISC ADTTE, CNSR = 0 indicates an event and CNSR = 1 indicates censoring. Surv_CNSR(AVAL, CNSR) handles this correctly without requiring manual recoding.


Kaplan-Meier Plot

The plot below uses survfit2() with Surv_CNSR() to fit the survival model. Because PARAM and PARAMCD are present in the data, {ggsurvfit} automatically uses PARAM as the y-axis label — a key CDISC-aware feature. The KM plot is shown once here and shared across all summary tables below.

# ── Fit the survival model ─────────────────────────────────────────────────────
# survfit2() is preferred over survfit() as it tracks the calling environment,
# enabling clean legend labels and p-value computation downstream.
km_fit <- survfit2(Surv_CNSR(AVAL, CNSR) ~ 1, data = adtte_pfs)

# ── Build the plot ─────────────────────────────────────────────────────────────
km_fit |>
  ggsurvfit(linewidth = 1) +
  # ── Explicit color scales prevent the site theme from stripping colour to B&W ─
  scale_color_manual(values = c("#2c7bb6")) +
  scale_fill_manual(values = c("#2c7bb6")) +
  add_confidence_interval() +
  add_risktable(
    risktable_stats = c("n.risk", "cum.event", "cum.censor"),
    risktable_group = "risktable_stats",
    stats_label = list(
      n.risk     = "At Risk",
      cum.event  = "Events (cum.)",
      cum.censor = "Censored (cum.)"
    ),
    theme = theme_risktable_default(axis.text.y.size = 9, plot.title.size = 9)
  ) +
  add_quantile(
    y_value   = 0.5, # median survival guideline
    color     = "gray40",
    linewidth = 0.75,
    linetype  = "dashed"
  ) +
  add_censor_mark(shape = 3, size = 2) +
  scale_ggsurvfit() +
  labs(
    title = paste0(unique(adtte_pfs$PARAM), "\nKaplan-Meier Estimate"),
    x = "Time (Days)",
    y = "Progression-Free Survival Probability",
    caption = paste0(
      "Analysis dataset: ADTTE_ONCO  |  PARAMCD: ", unique(adtte_pfs$PARAMCD),
      "\nCensored observations marked with '+'"
    )
  ) +
  theme_ggsurvfit_default() +
  theme(plot.caption = element_text(hjust = 0, size = 8))

Figure notes:

  • The shaded band shows the pointwise 95% confidence interval (default: log-log transformed).
  • The dashed horizontal line marks the 50% survival probability; its intersection with the curve gives the median survival time.
  • + marks indicate individual censored observations.
  • The risk table beneath the plot shows cumulative counts aligned to the x-axis breaks.

Summary Tables

{gtsummary}’s tbl_survfit() consumes the km_fit survfit2 object directly to produce publication-ready tables. The statistics are extracted and formatted in a single pipeline.

Median Survival Table

# ── Median survival with 95% CI ────────────────────────────────────────────────
tbl_survfit(
  km_fit,
  probs        = 0.5, # 50th percentile = median
  label_header = "**Median (95% CI)**"
) |>
  modify_caption(
    paste0(
      "**Table 1. Median Progression-Free Survival**",
      "\nADTTE_ONCO  |  PARAMCD: ", unique(adtte_pfs$PARAMCD)
    )
  ) |>
  bold_labels()
Table 1. Median Progression-Free Survival ADTTE_ONCO | PARAMCD: PFS
Characteristic Median (95% CI)
Overall 64 (61, —)

Survival Probability Table

This table shows estimated PFS probabilities at clinically meaningful time points. Here we use 1, 2, 3, and 6 months (expressed in days, matching the units of AVAL in adtte_onco). Adjust times to match the units in your dataset.

# ── Survival probability at selected time points ───────────────────────────────
# AVAL in adtte_onco is in days
tbl_survfit(
  km_fit,
  times        = c(30, 60, 90, 180), # 1, 2, 3, 6 months in days
  label_header = "**PFS Probability (95% CI)**"
) |>
  modify_header(
    label = "**Time Point**"
  ) |>
  modify_table_body(
    ~ .x |>
      mutate(label = recode(label,
        "30"  = "1 month",
        "60"  = "2 months",
        "90"  = "3 months",
        "180" = "6 months"
      ))
  ) |>
  modify_caption(
    paste0(
      "**Table 2. Progression-Free Survival Probability at Selected Time Points**",
      "\nADTTE_ONCO  |  PARAMCD: ", unique(adtte_pfs$PARAMCD)
    )
  ) |>
  bold_labels()
Table 2. Progression-Free Survival Probability at Selected Time Points ADTTE_ONCO | PARAMCD: PFS
Time Point PFS Probability (95% CI) PFS Probability (95% CI) PFS Probability (95% CI) PFS Probability (95% CI)
Overall 91% (75%, 100%) 81% (60%, 100%) 46% (23%, 94%) 0% (—, —)
NoteARD output

Every {gtsummary} table stores its underlying statistics in a .$cards slot as an Analysis Results Data (ARD) object conforming to the emerging CDISC Analysis Results Standard. You can also generate a standalone survival ARD for audit, downstream rendering, or cross-validation using cardx::ard_survival_survfit():


# Median survival ARD
adtte_pfs |>
  cardx::ard_survival_survfit(
    y     = "Surv_CNSR(AVAL, CNSR)",
    probs = 0.5
  )

# Time-point survival probability ARD
adtte_pfs |>
  cardx::ard_survival_survfit(
    y     = "Surv_CNSR(AVAL, CNSR)",
    times = c(30, 60, 90, 180) # days
  )

The y argument must be passed as a character string when using the data frame method. See the {cardx} documentation for full details.


Best Overall Response Table

The Best Overall Response (BOR) table is a standard oncology efficacy output summarising each subject’s best RECIST response category across all post-baseline assessments. Using PARAMCD == "CBOR" (Confirmed Best Overall Response) aligns with the primary regulatory definition of ORR as confirmed CR + PR.

The {gtsummary} tbl_summary() function produces the category counts and percentages directly from AVALC. Response categories are ordered from best to worst (CR → PR → SD → NON-CR/NON-PD → PD → NE → MISSING) using a factor. The ORR (CR + PR) is then derived and reported separately in the inline chunk below.

# ── Filter to CBOR parameter ──────────────────────────────────
# ANL01FL == "Y" restricts to the primary analysis flag records.
adrs_bor <- adrs_onco |>
  filter(PARAMCD == "CBOR" & ANL01FL == "Y") |>
  mutate(
    # Order AVALC from best to worst response for table display
    AVALC = fct_relevel(
      AVALC,
      "CR", "PR", "SD", "NON-CR/NON-PD", "PD", "NE", "MISSING"
    )
  )
# ── Best Overall Response table by treatment arm ───────────────────────────────
adrs_bor |>
  tbl_summary(
    by = ARM,
    include = AVALC,
    label = list(AVALC = "Best Overall Response"),
    statistic = list(AVALC = "{n} ({p}%)"),
    digits = list(AVALC = list(0, 1))
  ) |>
  add_overall(last = TRUE) |>
  add_n() |>
  bold_labels() |>
  modify_header(label = "**Response**") |>
  modify_caption(
    paste0(
      "**Table 3. Confirmed Best Overall Response (RECIST 1.1)**",
      "\nADRS_ONCO  |  PARAMCD: CBOR  |  ANL01FL = Y"
    )
  )
Table 3. Confirmed Best Overall Response (RECIST 1.1) ADRS_ONCO | PARAMCD: CBOR | ANL01FL = Y
Response N Placebo
N = 861
Xanomeline High Dose
N = 841
Xanomeline Low Dose
N = 841
Overall
N = 2541
Best Overall Response 254



    PR
1 (1.2%) 0 (0.0%) 0 (0.0%) 1 (0.4%)
    SD
2 (2.3%) 1 (1.2%) 1 (1.2%) 4 (1.6%)
    NON-CR/NON-PD
0 (0.0%) 1 (1.2%) 0 (0.0%) 1 (0.4%)
    PD
0 (0.0%) 1 (1.2%) 0 (0.0%) 1 (0.4%)
    NE
0 (0.0%) 0 (0.0%) 1 (1.2%) 1 (0.4%)
    MISSING
83 (96.5%) 81 (96.4%) 82 (97.6%) 246 (96.9%)
1 n (%)

The ORR (overall response rate, CR + PR) is not directly produced by tbl_summary() as a combined row, but can be derived and appended using tbl_stack() or reported inline:

# ── ORR: proportion with CR or PR, by arm ─────────────────────────────────────
adrs_bor |>
  summarise(
    .by    = ARM,
    n_resp = sum(AVALC %in% c("CR", "PR"), na.rm = TRUE),
    n_tot  = n(),
    orr    = round(100 * n_resp / n_tot, 1)
  ) |>
  mutate(label = paste0(n_resp, "/", n_tot, " (", orr, "%)")) |>
  select(ARM, ORR = label) |>
  knitr::kable(caption = "Overall Response Rate (CR + PR) by Arm")
Overall Response Rate (CR + PR) by Arm
ARM ORR
Placebo 1/86 (1.2%)
Xanomeline High Dose 0/84 (0%)
Xanomeline Low Dose 0/84 (0%)

Key Package Notes

Surv_CNSR() vs Surv()

CDISC ADTTE datasets use CNSR = 0 for events and CNSR = 1 for censoring — the opposite of survival::Surv(). Using Surv_CNSR(AVAL, CNSR) removes this error-prone manual recoding step, and it works identically in both the survfit2() call for plotting and the ard_survival_survfit() call for ARDs:

# ✗  Error-prone: requires manual recoding of CNSR
head(survival::Surv(adtte_pfs$AVAL, 1 - adtte_pfs$CNSR), 8)
[1] 64+  1+ 43   1+ 43+  1+ 22+  1+
# ✓  Correct CDISC-aware approach — identical result, no manual recoding needed
head(ggsurvfit::Surv_CNSR(adtte_pfs$AVAL, adtte_pfs$CNSR), 8)
[1] 64+  1+ 43   1+ 43+  1+ 22+  1+

survfit2() vs survfit()

survfit2() tracks the calling environment, which enables {ggsurvfit} to cleanly remove raw variable names (e.g. CNSR=0) from figure legends and to compute p-values via survfit2_p(). For ADTTE data, it also reads PARAM/PARAMCD automatically to populate axis labels. Use survfit2() for all plotting work.

Switching endpoints

All outputs above are driven by the adtte_pfs object. To switch to OS or RSD, simply change the filter at the top of the Setup section:

# Overall Survival — expect few events; median may not be estimable
adtte_os <- adtte_onco |> filter(PARAMCD == "OS")

# Duration of Response — responders only; smaller N than OS/PFS
# Note: admiralonco uses PARAMCD = "RSD", not "DOR"
adtte_rsd <- adtte_onco |> filter(PARAMCD == "RSD")

Extending to multiple strata

Because adtte_onco already contains ARM and ARMCD from the {admiralonco} template, no ADSL join is needed to produce a stratified KM plot:

# With two arms (ARM), add_pvalue() computes and annotates a log-rank test p-value.
# Not applicable for single-arm fits (~ 1) — only add when comparing groups.
survfit2(Surv_CNSR(AVAL, CNSR) ~ ARM, data = adtte_pfs) |>
  ggsurvfit(linewidth = 1) +
  scale_color_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  add_confidence_interval() +
  add_risktable(
    theme = theme_risktable_default(axis.text.y.size = 9, plot.title.size = 9)
  ) +
  add_pvalue(location = "annotation") +
  scale_ggsurvfit()

Stratified KM Plot: ggsurvfit::adtte

{ggsurvfit} ships its own example ADTTE dataset — a four-arm Phase III breast cancer trial (2,199 subjects, PFS endpoint, ~34% event rate) originally from the VIS-SIG Wonderful Wednesdays initiative. It provides a richer illustrative dataset than adtte_onco for showing a stratified KM plot, since it has four well-separated treatment arms with a good event rate and an estimable median.

The treatment variable is TRT01P (planned treatment at randomisation), which contains the four arm labels directly. STR01 is hormone receptor status — a stratification covariate, not the treatment assignment.

# ── ggsurvfit::adtte — four-arm breast cancer PFS trial ────────────────────────
# TRT01P = planned treatment; STR01 = hormone receptor status (not treatment)
survfit2(Surv_CNSR(AVAL, CNSR) ~ TRT01P, data = ggsurvfit::adtte) |>
  ggsurvfit(linewidth = 1) +
  scale_color_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  add_confidence_interval() +
  add_risktable(
    risktable_stats = "n.risk",
    stats_label = list(n.risk = "At Risk"),
    theme = theme_risktable_default(axis.text.y.size = 9, plot.title.size = 9)
  ) +
  add_quantile(
    y_value   = 0.5,
    color     = "gray40",
    linewidth = 0.75,
    linetype  = "dashed"
  ) +
  add_censor_mark(shape = 3, size = 1.5) +
  add_pvalue(location = "annotation", x = 4.5) +
  scale_ggsurvfit() +
  labs(
    title = "Progression-Free Survival by Treatment Arm",
    x = "Time (Years)", # ggsurvfit::adtte AVAL is in years
    y = "Progression-Free Survival Probability",
    caption = paste0(
      "Dataset: ggsurvfit::adtte  |  HER2+ breast cancer Phase III trial",
      "\nCensored observations marked with '+'"
    )
  ) +
  theme_ggsurvfit_default() +
  theme(
    plot.caption  = element_text(hjust = 0, size = 8),
    legend.title  = element_blank()
  )

Figure notes:

  • The risk table shows only n.risk here (rather than cumulative events and censored) to keep the four-strata table readable at standard figure width.
  • The p-value is an overall log-rank test comparing survival across all four arms simultaneously. It is placed as an annotation (rather than in the caption) to remain visible when the figure is used standalone. For pairwise comparisons of each active arm vs. a reference, a Cox model or survdiff() with specific contrast coding would be needed instead.
  • ggsurvfit::adtte contains only a single PFS endpoint per subject, so no filter(PARAMCD == ...) step is needed.

Subgroup Forest Plot

A subgroup forest plot displays the hazard ratio (HR) and 95% confidence interval from a Cox model for each subgroup level, giving a visual summary of whether the treatment effect is consistent across key patient characteristics.

Here we restrict ggsurvfit::adtte to two arms — tablemab x 52 weeks (reference) and vismab x 52 weeks (active) — and fit separate Cox models within each level of STR01L (hormone receptor status) and STR02L (prior radiotherapy), plus an overall row.

# ── Restrict to two arms ───────────────────────────────────────────────────────
adtte_2arm <- ggsurvfit::adtte |>
  filter(TRT01PN %in% c(1, 2)) |>
  mutate(TRT01P = factor(TRT01P,
    levels = c("tablemab x 52 weeks", "vismab x 52 weeks")
  ))

# ── Helper: fit Cox and return a one-row HR summary ──────────────────────────
cox_hr <- function(data, subgroup, level) {
  fit <- coxph(Surv_CNSR(AVAL, CNSR) ~ TRT01P, data = data)
  tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
    mutate(
      subgroup = subgroup,
      level    = level,
      n        = nrow(data),
      n_events = sum(data$CNSR == 0)
    )
}

# ── Build rows: Overall + by STR01L + by STR02L ───────────────────────────────
str01_rows <- lapply(unique(adtte_2arm$STR01L), function(lv) {
  cox_hr(filter(adtte_2arm, STR01L == lv), "Hormone Receptor Status", lv)
})

str02_rows <- lapply(unique(adtte_2arm$STR02L), function(lv) {
  cox_hr(filter(adtte_2arm, STR02L == lv), "Prior Radiotherapy", lv)
})

forest_data <- bind_rows(
  cox_hr(adtte_2arm, "Overall", "Overall"),
  bind_rows(str01_rows),
  bind_rows(str02_rows)
) |>
  mutate(
    label = ifelse(level == "Overall", "Overall", paste0("  ", level)),
    label = factor(label, levels = rev(unique(label))),
    hr_label = sprintf("%.2f (%.2f\u2013%.2f)", estimate, conf.low, conf.high)
  )
ggplot(forest_data, aes(x = estimate, y = label)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "gray50") +
  geom_pointrange(
    aes(xmin = conf.low, xmax = conf.high),
    color = "#2c7bb6",
    linewidth = 0.75,
    fatten = 4
  ) +
  geom_text(
    aes(x = 3.5, label = hr_label),
    hjust = 1, size = 3
  ) +
  scale_x_log10(
    limits = c(0.3, 4),
    breaks = c(0.5, 1, 2),
    labels = c("0.5", "1", "2")
  ) +
  facet_grid(subgroup ~ ., scales = "free_y", space = "free") +
  labs(
    title    = "Subgroup Forest Plot: PFS Hazard Ratio",
    subtitle = "vismab x 52 weeks vs. tablemab x 52 weeks (reference)",
    x        = "Hazard Ratio (log scale)  |  \u2190 Favours tablemab   Favours vismab \u2192",
    y        = NULL,
    caption  = "Dataset: ggsurvfit::adtte  |  Cox proportional hazards model  |  95% CI"
  ) +
  theme_bw() +
  theme(
    strip.background    = element_rect(fill = "gray90"),
    strip.text          = element_text(face = "bold"),
    panel.grid.minor    = element_blank(),
    panel.grid.major.y  = element_blank(),
    plot.caption        = element_text(hjust = 0, size = 8)
  )

Figure notes:

  • HR < 1 favours tablemab (reference arm); HR > 1 favours vismab.
  • The dashed vertical line at HR = 1 represents no treatment difference.
  • Confidence intervals that cross 1 indicate no statistically significant difference in that subgroup at the 5% level.
  • This is for illustration only — the dataset uses anonymised arm names and should not be interpreted as a real-world efficacy comparison.

Customization Tips

You can easily tailor the appearance and output of your tables and plots:

  • Themes & Colors:
    • Use theme_ggsurvfit_default() or any ggplot2 theme (e.g., theme_minimal(), theme_bw()) to change the look of plots.
    • Adjust color palettes with scale_color_manual(), scale_fill_brewer(), or your own color vectors.
  • Fonts & Labels:
    • Modify axis titles, legend text, and captions using labs() and theme() arguments (e.g., axis.title = element_text(size = 14)).
  • Exporting:
    • Save plots with ggsave("plot.png", width = 8, height = 6).
    • Export tables to Word/HTML with as_flextable() or as_gt() from {gtsummary}.
  • Table Formatting:
    • Use {gtsummary} functions like bold_labels(), modify_caption(), and modify_header() for custom table styling.

See the documentation for each package for more advanced customization options.

Adverse Events
Pharmacokinetic
Source Code
---
title: "Oncology Survival"
order: 4
---

```{r setup script, include=FALSE, purl=FALSE}
invisible_hook_purl <- function(before, options, ...) {
  knitr::hook_purl(before, options, ...)
  NULL
}
knitr::knit_hooks$set(purl = invisible_hook_purl)
```

## Introduction

This guide demonstrates how pharmaverse packages, along with tools from the
tidyverse, can be used to create standard oncology efficacy Tables, Listings,
and Graphs (TLGs) using the `{pharmaverseadam}` `ADTTE_ONCO` and `ADRS_ONCO`
datasets as input.

The packages used, with a brief description of their purpose, are as follows:

* [`{pharmaverseadam}`](https://pharmaverse.github.io/pharmaverseadam/): provides
  CDISC ADaM example datasets for use in pharmaverse examples and tests.
* [`{ggsurvfit}`](https://www.danieldsjoberg.com/ggsurvfit/): eases the creation
  of publication-ready time-to-event (survival) figures built on `{ggplot2}`.
  Includes `Surv_CNSR()` to handle CDISC ADTTE censoring conventions natively.
* [`{gtsummary}`](https://www.danieldsjoberg.com/gtsummary/): creates
  publication-ready summary and analytical tables. Used here for survival and
  tumor response summaries.
* [`{dplyr}`](https://dplyr.tidyverse.org/): provides data manipulation
  functions used to prepare and filter the ADaM data.
* [`{forcats}`](https://forcats.tidyverse.org/): provides factor manipulation
  utilities used to order RECIST response categories for table display.
* [`{broom}`](https://broom.tidymodels.org/): extracts tidy model output from
  Cox regression fits — used to build the subgroup forest plot.
* [`{survival}`](https://cran.r-project.org/package=survival): provides `coxph()`
  for fitting Cox proportional hazards models.

The outputs produced in this example are:

1. **Kaplan-Meier plot** with confidence intervals, risk table, and median
   guideline — a standard figure for oncology efficacy reporting.
2. **Risk table** displayed beneath the KM plot showing number at risk, events,
   and censored counts at each time point.
3. **Median survival table** with 95% confidence interval, suitable for
   inclusion in a clinical study report (CSR).
4. **Survival probability table** at selected time points.
5. **Best Overall Response table** with category counts, percentages, and
   inline ORR (CR + PR) summary from `ADRS_ONCO`.
6. **Stratified KM plot** using the built-in `ggsurvfit::adtte` four-arm trial
   dataset to illustrate multi-arm analyses.
7. **Subgroup forest plot** showing PFS hazard ratios by hormone receptor status
   and prior radiotherapy history using `ggsurvfit::adtte`.

---

## Setup

We load the required packages and read `ADTTE_ONCO` and `ADRS_ONCO` from
`{pharmaverseadam}`. `ADTTE_ONCO` is the oncology-specific TTE dataset generated
from the `{admiralonco}` template — it contains three endpoints (OS, PFS, RSD)
with treatment arm variables already merged in. `ADRS_ONCO` is the tumor response
dataset containing per-visit and summary response parameters (BOR, CBOR, RSP)
derived from RECIST 1.1 assessments.

Because the CDISC ADTTE censoring variable `CNSR` is coded `1 = censored /
0 = event` (the reverse of base R's `survival::Surv()` convention), we use
`Surv_CNSR()` from `{ggsurvfit}` throughout to avoid mis-coding errors.

```{r setup}
#| message: false
#| warning: false
library(pharmaverseadam)
library(ggsurvfit)
library(ggplot2)
library(gtsummary)
library(dplyr)
library(forcats)
library(broom)
library(survival)

# ── Read data ──────────────────────────────────────────────────────────────────
adtte_onco <- pharmaverseadam::adtte_onco

# ── ADRS_ONCO: tumor response data ───────────────────────────────────────────
adrs_onco <- pharmaverseadam::adrs_onco |>
  filter(ARMCD != "Scrnfail")

# Overview of available endpoints and their event rates
adtte_onco |>
  group_by(PARAMCD, PARAM) |>
  summarise(
    N           = n(),
    N_events    = sum(CNSR == 0),
    Pct_events  = round(100 * mean(CNSR == 0), 1),
    Median_AVAL = round(median(AVAL), 2),
    .groups     = "drop"
  )
```

> **Why PFS for this example?** OS in early-phase oncology trials typically has
> few events (most subjects are still alive), producing a near-flat KM curve
> with a median that cannot be estimated — not ideal for illustration. PFS has
> more events and a shorter follow-up window, giving a more informative curve.
> The code below uses `PARAMCD == "PFS"`; swap to `"OS"` or `"RSD"` as needed.
> Note that `{admiralonco}` uses `"RSD"` (not `"DOR"`) for Duration of Response.

::: {.callout-note}
`pharmaverseadam::adtte_onco` is a small example dataset with limited events
across all endpoints, so the resulting KM curves and tables are for illustration
only and should not be interpreted clinically. For a richer example with more
events and better-separated arms, see the
[Stratified KM Plot](#stratified-km-plot-ggsurvfitadtte) section below, which
uses the `ggsurvfit::adtte` four-arm breast cancer trial dataset.
:::

```{r filter-pfs}
#| message: false
#| warning: false

# ── PFS endpoint ────────────────────────────────────
adtte_pfs <- adtte_onco |>
  filter(PARAMCD == "PFS")

# Preview key variables
adtte_pfs |>
  select(USUBJID, PARAM, PARAMCD, AVAL, CNSR, EVNTDESC, CNSDTDSC) |>
  head(5)
```

> **Note on CNSR coding:** In CDISC ADTTE, `CNSR = 0` indicates an **event**
> and `CNSR = 1` indicates **censoring**. `Surv_CNSR(AVAL, CNSR)` handles this
> correctly without requiring manual recoding.

---

## Kaplan-Meier Plot

The plot below uses `survfit2()` with `Surv_CNSR()` to fit the survival model.
Because `PARAM` and `PARAMCD` are present in the data, `{ggsurvfit}`
automatically uses `PARAM` as the y-axis label — a key CDISC-aware feature.
The KM plot is shown once here and shared across all summary tables below.

```{r km-plot}
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 7

# ── Fit the survival model ─────────────────────────────────────────────────────
# survfit2() is preferred over survfit() as it tracks the calling environment,
# enabling clean legend labels and p-value computation downstream.
km_fit <- survfit2(Surv_CNSR(AVAL, CNSR) ~ 1, data = adtte_pfs)

# ── Build the plot ─────────────────────────────────────────────────────────────
km_fit |>
  ggsurvfit(linewidth = 1) +
  # ── Explicit color scales prevent the site theme from stripping colour to B&W ─
  scale_color_manual(values = c("#2c7bb6")) +
  scale_fill_manual(values = c("#2c7bb6")) +
  add_confidence_interval() +
  add_risktable(
    risktable_stats = c("n.risk", "cum.event", "cum.censor"),
    risktable_group = "risktable_stats",
    stats_label = list(
      n.risk     = "At Risk",
      cum.event  = "Events (cum.)",
      cum.censor = "Censored (cum.)"
    ),
    theme = theme_risktable_default(axis.text.y.size = 9, plot.title.size = 9)
  ) +
  add_quantile(
    y_value   = 0.5, # median survival guideline
    color     = "gray40",
    linewidth = 0.75,
    linetype  = "dashed"
  ) +
  add_censor_mark(shape = 3, size = 2) +
  scale_ggsurvfit() +
  labs(
    title = paste0(unique(adtte_pfs$PARAM), "\nKaplan-Meier Estimate"),
    x = "Time (Days)",
    y = "Progression-Free Survival Probability",
    caption = paste0(
      "Analysis dataset: ADTTE_ONCO  |  PARAMCD: ", unique(adtte_pfs$PARAMCD),
      "\nCensored observations marked with '+'"
    )
  ) +
  theme_ggsurvfit_default() +
  theme(plot.caption = element_text(hjust = 0, size = 8))
```

**Figure notes:**

* The shaded band shows the pointwise 95% confidence interval (default:
  log-log transformed).
* The dashed horizontal line marks the 50% survival probability; its
  intersection with the curve gives the median survival time.
* `+` marks indicate individual censored observations.
* The risk table beneath the plot shows cumulative counts aligned to the
  x-axis breaks.

---

## Summary Tables

`{gtsummary}`'s `tbl_survfit()` consumes the `km_fit` `survfit2` object directly
to produce publication-ready tables. The statistics are extracted and formatted
in a single pipeline.

### Median Survival Table

```{r median-table}
#| message: false
#| warning: false

# ── Median survival with 95% CI ────────────────────────────────────────────────
tbl_survfit(
  km_fit,
  probs        = 0.5, # 50th percentile = median
  label_header = "**Median (95% CI)**"
) |>
  modify_caption(
    paste0(
      "**Table 1. Median Progression-Free Survival**",
      "\nADTTE_ONCO  |  PARAMCD: ", unique(adtte_pfs$PARAMCD)
    )
  ) |>
  bold_labels()
```

### Survival Probability Table

This table shows estimated PFS probabilities at clinically meaningful time
points. Here we use 1, 2, 3, and 6 months (expressed in days, matching the
units of `AVAL` in `adtte_onco`). Adjust `times` to match the units in your dataset.

```{r prob-table}
#| message: false
#| warning: false

# ── Survival probability at selected time points ───────────────────────────────
# AVAL in adtte_onco is in days
tbl_survfit(
  km_fit,
  times        = c(30, 60, 90, 180), # 1, 2, 3, 6 months in days
  label_header = "**PFS Probability (95% CI)**"
) |>
  modify_header(
    label = "**Time Point**"
  ) |>
  modify_table_body(
    ~ .x |>
      mutate(label = recode(label,
        "30"  = "1 month",
        "60"  = "2 months",
        "90"  = "3 months",
        "180" = "6 months"
      ))
  ) |>
  modify_caption(
    paste0(
      "**Table 2. Progression-Free Survival Probability at Selected Time Points**",
      "\nADTTE_ONCO  |  PARAMCD: ", unique(adtte_pfs$PARAMCD)
    )
  ) |>
  bold_labels()
```


::: {.callout-note}
## ARD output

Every `{gtsummary}` table stores its underlying statistics in a `.$cards` slot
as an Analysis Results Data (ARD) object conforming to the emerging CDISC
Analysis Results Standard. You can also generate a standalone survival ARD for
audit, downstream rendering, or cross-validation using
`cardx::ard_survival_survfit()`:

```r

# Median survival ARD
adtte_pfs |>
  cardx::ard_survival_survfit(
    y     = "Surv_CNSR(AVAL, CNSR)",
    probs = 0.5
  )

# Time-point survival probability ARD
adtte_pfs |>
  cardx::ard_survival_survfit(
    y     = "Surv_CNSR(AVAL, CNSR)",
    times = c(30, 60, 90, 180) # days
  )
```

The `y` argument must be passed as a character string when using the data frame
method. See the [`{cardx}` documentation](https://insightsengineering.github.io/cardx/main/reference/ard_survival_survfit.html)
for full details.
:::

---

## Best Overall Response Table

The Best Overall Response (BOR) table is a standard oncology efficacy output
summarising each subject's best RECIST response category across all post-baseline
assessments. Using `PARAMCD == "CBOR"` (Confirmed Best Overall Response) aligns
with the primary regulatory definition of ORR as confirmed CR + PR.

The `{gtsummary}` `tbl_summary()` function produces the category counts and
percentages directly from `AVALC`. Response categories are ordered from best to
worst (CR → PR → SD → NON-CR/NON-PD → PD → NE → MISSING) using a factor.
The ORR (CR + PR) is then derived and reported separately in the inline chunk below.

```{r bor-setup}
#| message: false
#| warning: false

# ── Filter to CBOR parameter ──────────────────────────────────
# ANL01FL == "Y" restricts to the primary analysis flag records.
adrs_bor <- adrs_onco |>
  filter(PARAMCD == "CBOR" & ANL01FL == "Y") |>
  mutate(
    # Order AVALC from best to worst response for table display
    AVALC = fct_relevel(
      AVALC,
      "CR", "PR", "SD", "NON-CR/NON-PD", "PD", "NE", "MISSING"
    )
  )
```

```{r bor-table}
#| message: false
#| warning: false

# ── Best Overall Response table by treatment arm ───────────────────────────────
adrs_bor |>
  tbl_summary(
    by = ARM,
    include = AVALC,
    label = list(AVALC = "Best Overall Response"),
    statistic = list(AVALC = "{n} ({p}%)"),
    digits = list(AVALC = list(0, 1))
  ) |>
  add_overall(last = TRUE) |>
  add_n() |>
  bold_labels() |>
  modify_header(label = "**Response**") |>
  modify_caption(
    paste0(
      "**Table 3. Confirmed Best Overall Response (RECIST 1.1)**",
      "\nADRS_ONCO  |  PARAMCD: CBOR  |  ANL01FL = Y"
    )
  )
```

The ORR (overall response rate, CR + PR) is not directly produced by
`tbl_summary()` as a combined row, but can be derived and appended using
`tbl_stack()` or reported inline:

```{r orr-inline}
#| message: false
#| warning: false

# ── ORR: proportion with CR or PR, by arm ─────────────────────────────────────
adrs_bor |>
  summarise(
    .by    = ARM,
    n_resp = sum(AVALC %in% c("CR", "PR"), na.rm = TRUE),
    n_tot  = n(),
    orr    = round(100 * n_resp / n_tot, 1)
  ) |>
  mutate(label = paste0(n_resp, "/", n_tot, " (", orr, "%)")) |>
  select(ARM, ORR = label) |>
  knitr::kable(caption = "Overall Response Rate (CR + PR) by Arm")
```


---

## Key Package Notes

**`Surv_CNSR()` vs `Surv()`**

CDISC ADTTE datasets use `CNSR = 0` for events and `CNSR = 1` for censoring —
the opposite of `survival::Surv()`. Using `Surv_CNSR(AVAL, CNSR)` removes
this error-prone manual recoding step, and it works identically in both the
`survfit2()` call for plotting and the `ard_survival_survfit()` call for ARDs:

```{r cnsr-note}
#| output: true

# ✗  Error-prone: requires manual recoding of CNSR
head(survival::Surv(adtte_pfs$AVAL, 1 - adtte_pfs$CNSR), 8)

# ✓  Correct CDISC-aware approach — identical result, no manual recoding needed
head(ggsurvfit::Surv_CNSR(adtte_pfs$AVAL, adtte_pfs$CNSR), 8)
```

**`survfit2()` vs `survfit()`**

`survfit2()` tracks the calling environment, which enables `{ggsurvfit}` to
cleanly remove raw variable names (e.g. `CNSR=0`) from figure legends and to
compute p-values via `survfit2_p()`. For ADTTE data, it also reads `PARAM`/`PARAMCD`
automatically to populate axis labels. Use `survfit2()` for all plotting work.

**Switching endpoints**

All outputs above are driven by the `adtte_pfs` object. To switch
to OS or RSD, simply change the filter at the top of the Setup section:

```{r endpoint-note}
#| eval: false

# Overall Survival — expect few events; median may not be estimable
adtte_os <- adtte_onco |> filter(PARAMCD == "OS")

# Duration of Response — responders only; smaller N than OS/PFS
# Note: admiralonco uses PARAMCD = "RSD", not "DOR"
adtte_rsd <- adtte_onco |> filter(PARAMCD == "RSD")
```

**Extending to multiple strata**

Because `adtte_onco` already contains `ARM` and `ARMCD` from the `{admiralonco}`
template, no ADSL join is needed to produce a stratified KM plot:

```{r strata-note}
#| eval: false

# With two arms (ARM), add_pvalue() computes and annotates a log-rank test p-value.
# Not applicable for single-arm fits (~ 1) — only add when comparing groups.
survfit2(Surv_CNSR(AVAL, CNSR) ~ ARM, data = adtte_pfs) |>
  ggsurvfit(linewidth = 1) +
  scale_color_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  add_confidence_interval() +
  add_risktable(
    theme = theme_risktable_default(axis.text.y.size = 9, plot.title.size = 9)
  ) +
  add_pvalue(location = "annotation") +
  scale_ggsurvfit()
```
---

## Stratified KM Plot: `ggsurvfit::adtte`

`{ggsurvfit}` ships its own example ADTTE dataset — a four-arm Phase III breast
cancer trial (2,199 subjects, PFS endpoint, ~34% event rate) originally from the
[VIS-SIG Wonderful Wednesdays](https://github.com/VIS-SIG/Wonderful-Wednesdays/tree/master/data/2020/2020-04-08)
initiative. It provides a richer illustrative dataset than `adtte_onco` for
showing a **stratified KM plot**, since it has four well-separated treatment arms
with a good event rate and an estimable median.

The treatment variable is `TRT01P` (planned treatment at randomisation), which
contains the four arm labels directly. `STR01` is hormone receptor status — a
stratification covariate, not the treatment assignment.

```{r km-plot-adtte}
#| message: false
#| warning: false
#| fig-width: 11
#| fig-height: 8

# ── ggsurvfit::adtte — four-arm breast cancer PFS trial ────────────────────────
# TRT01P = planned treatment; STR01 = hormone receptor status (not treatment)
survfit2(Surv_CNSR(AVAL, CNSR) ~ TRT01P, data = ggsurvfit::adtte) |>
  ggsurvfit(linewidth = 1) +
  scale_color_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  add_confidence_interval() +
  add_risktable(
    risktable_stats = "n.risk",
    stats_label = list(n.risk = "At Risk"),
    theme = theme_risktable_default(axis.text.y.size = 9, plot.title.size = 9)
  ) +
  add_quantile(
    y_value   = 0.5,
    color     = "gray40",
    linewidth = 0.75,
    linetype  = "dashed"
  ) +
  add_censor_mark(shape = 3, size = 1.5) +
  add_pvalue(location = "annotation", x = 4.5) +
  scale_ggsurvfit() +
  labs(
    title = "Progression-Free Survival by Treatment Arm",
    x = "Time (Years)", # ggsurvfit::adtte AVAL is in years
    y = "Progression-Free Survival Probability",
    caption = paste0(
      "Dataset: ggsurvfit::adtte  |  HER2+ breast cancer Phase III trial",
      "\nCensored observations marked with '+'"
    )
  ) +
  theme_ggsurvfit_default() +
  theme(
    plot.caption  = element_text(hjust = 0, size = 8),
    legend.title  = element_blank()
  )
```

**Figure notes:**

* The risk table shows only `n.risk` here (rather than cumulative events and
  censored) to keep the four-strata table readable at standard figure width.
* The p-value is an **overall log-rank test** comparing survival across all four
  arms simultaneously. It is placed as an annotation (rather than in the caption)
  to remain visible when the figure is used standalone. For pairwise comparisons
  of each active arm vs. a reference, a Cox model or `survdiff()` with specific
  contrast coding would be needed instead.
* `ggsurvfit::adtte` contains only a single PFS endpoint per subject, so no
  `filter(PARAMCD == ...)` step is needed.

---

## Subgroup Forest Plot

A subgroup forest plot displays the hazard ratio (HR) and 95% confidence interval
from a Cox model for each subgroup level, giving a visual summary of whether the
treatment effect is consistent across key patient characteristics.

Here we restrict `ggsurvfit::adtte` to two arms — `tablemab x 52 weeks`
(reference) and `vismab x 52 weeks` (active) — and fit separate Cox models
within each level of `STR01L` (hormone receptor status) and `STR02L`
(prior radiotherapy), plus an overall row.

```{r forest-setup}
#| message: false
#| warning: false

# ── Restrict to two arms ───────────────────────────────────────────────────────
adtte_2arm <- ggsurvfit::adtte |>
  filter(TRT01PN %in% c(1, 2)) |>
  mutate(TRT01P = factor(TRT01P,
    levels = c("tablemab x 52 weeks", "vismab x 52 weeks")
  ))

# ── Helper: fit Cox and return a one-row HR summary ──────────────────────────
cox_hr <- function(data, subgroup, level) {
  fit <- coxph(Surv_CNSR(AVAL, CNSR) ~ TRT01P, data = data)
  tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
    mutate(
      subgroup = subgroup,
      level    = level,
      n        = nrow(data),
      n_events = sum(data$CNSR == 0)
    )
}

# ── Build rows: Overall + by STR01L + by STR02L ───────────────────────────────
str01_rows <- lapply(unique(adtte_2arm$STR01L), function(lv) {
  cox_hr(filter(adtte_2arm, STR01L == lv), "Hormone Receptor Status", lv)
})

str02_rows <- lapply(unique(adtte_2arm$STR02L), function(lv) {
  cox_hr(filter(adtte_2arm, STR02L == lv), "Prior Radiotherapy", lv)
})

forest_data <- bind_rows(
  cox_hr(adtte_2arm, "Overall", "Overall"),
  bind_rows(str01_rows),
  bind_rows(str02_rows)
) |>
  mutate(
    label = ifelse(level == "Overall", "Overall", paste0("  ", level)),
    label = factor(label, levels = rev(unique(label))),
    hr_label = sprintf("%.2f (%.2f\u2013%.2f)", estimate, conf.low, conf.high)
  )
```

```{r forest-plot}
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 5

ggplot(forest_data, aes(x = estimate, y = label)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "gray50") +
  geom_pointrange(
    aes(xmin = conf.low, xmax = conf.high),
    color = "#2c7bb6",
    linewidth = 0.75,
    fatten = 4
  ) +
  geom_text(
    aes(x = 3.5, label = hr_label),
    hjust = 1, size = 3
  ) +
  scale_x_log10(
    limits = c(0.3, 4),
    breaks = c(0.5, 1, 2),
    labels = c("0.5", "1", "2")
  ) +
  facet_grid(subgroup ~ ., scales = "free_y", space = "free") +
  labs(
    title    = "Subgroup Forest Plot: PFS Hazard Ratio",
    subtitle = "vismab x 52 weeks vs. tablemab x 52 weeks (reference)",
    x        = "Hazard Ratio (log scale)  |  \u2190 Favours tablemab   Favours vismab \u2192",
    y        = NULL,
    caption  = "Dataset: ggsurvfit::adtte  |  Cox proportional hazards model  |  95% CI"
  ) +
  theme_bw() +
  theme(
    strip.background    = element_rect(fill = "gray90"),
    strip.text          = element_text(face = "bold"),
    panel.grid.minor    = element_blank(),
    panel.grid.major.y  = element_blank(),
    plot.caption        = element_text(hjust = 0, size = 8)
  )
```

**Figure notes:**

* HR < 1 favours `tablemab` (reference arm); HR > 1 favours `vismab`.
* The dashed vertical line at HR = 1 represents no treatment difference.
* Confidence intervals that cross 1 indicate no statistically significant
  difference in that subgroup at the 5% level.
* This is for illustration only — the dataset uses anonymised arm names and
  should not be interpreted as a real-world efficacy comparison.

---

## Customization Tips

You can easily tailor the appearance and output of your tables and plots:

- **Themes & Colors:**
  - Use `theme_ggsurvfit_default()` or any `ggplot2` theme (e.g., `theme_minimal()`, `theme_bw()`) to change the look of plots.
  - Adjust color palettes with `scale_color_manual()`, `scale_fill_brewer()`, or your own color vectors.
- **Fonts & Labels:**
  - Modify axis titles, legend text, and captions using `labs()` and `theme()` arguments (e.g., `axis.title = element_text(size = 14)`).
- **Exporting:**
  - Save plots with `ggsave("plot.png", width = 8, height = 6)`.
  - Export tables to Word/HTML with `as_flextable()` or `as_gt()` from `{gtsummary}`.
- **Table Formatting:**
  - Use `{gtsummary}` functions like `bold_labels()`, `modify_caption()`, and `modify_header()` for custom table styling.

See the documentation for each package for more advanced customization options.
 
Cookie Preferences