Working with Clinical Trial Data? There’s a Pharmaverse Package for That

Looking for R packages to manage clinical trial data? Pharmaverse has tools for every stage from data collection to submission!
Technical
Community
Authors

Gift Kenneth

Sunil Gupta

APPSILON

Published

February 28, 2025

Working with clinical trial data is no small task. It needs to be precise, compliant, and efficient. Traditionally, this meant using proprietary tools and working within siloed systems, which often made the process more complicated and expensive than necessary. But we think there’s a better way.

The pharmaverse is an open-source ecosystem of R packages built specifically for clinical trials. These tools integrate seamlessly with the Tidyverse, making data management more flexible, efficient, and transparent.

Whether you’re collecting, validating, analyzing, or preparing data for regulatory submission, there’s a pharmaverse package designed to support your workflow and help you work smarter.

This post covers:

By the end, you’ll have a clear understanding of how pharmaverse supports clinical trial operations and how to apply these tools in your work.

Key Stages of Clinical Reporting

Managing clinical trial data involves multiple stages, each with its own challenges. Pharmaverse provides a range of R packages that support different parts of the process, sometimes even offering multiple options for the same task. This flexibility allows organizations to choose the best tools for their specific needs rather than sticking to a one-size-fits-all approach.

A metadata-driven approach helps ensure that clinical trial data is consistently structured and aligned with regulatory standards. The typical process follows this sequence:

MetadataOAKAdmiralDefine.xmlTLGsSubmissions

Some examples of pharmaverse packages that support clinical reporting include:

  • {diffdf} – Tracking differences in datasets.
  • {metatools} – Metadata management and transformation.
  • {sdtm.oak} – The primary pharmaverse package for SDTM dataset creation.
  • {datacutr} - Performing data cuts.
  • {admiral} – Standardized data derivations.
  • {metacore} – Metadata-driven structures.
  • The pharmaverse provides multiple table-making packages, such as {chevron} (which builds on {rtables}), {Tplyr}, {pharmaRTF}, {gtsummary}, {cards}, {tfrmt}, and {tidytlg}. More tools are listed on the TLGs page.
  • {xportr} – CDISC-compliant dataset export.
  • {pkglite} – Package management and tracking.
  • {metacore} and {metatools} – For standardized metadata structures and validation.
  • {logrx} - For logging R scripts.

Pharmaverse packages are built on top of Tidyverse tools and integrate seamlessly with packages like {dplyr} for data manipulation and {ggplot2} for visualization.

Note: This post highlights some key pharmaverse packages relevant to clinical reporting. For a full and up-to-date list, visit the Pharmaverse website. If there’s a package we missed that should be included, let us know, and we’d be happy to update this post.

By using these tools, organizations can optimize their data pipeline, ensuring clinical data is well-structured and ready for regulatory submission with ease.

Example: Creating ADSL

Building an ADSL dataset involves several key steps, from reading in data to deriving treatment variables and population flags. While these steps apply regardless of the tools used, pharmaverse packages like {admiral} simplify the process with functions designed for CDISC-compliant datasets.

This example is based on the ADSL template, which provides a structured approach to creating an ADSL dataset.

Step 1: Read in Data

To begin, clinical trial datasets such as DM, EX, DS, AE, and LB are loaded. The {pharmaversesdtm} package provides sample CDISC SDTM datasets:

library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(stringr)

# Load sample data
data("dm", package = "pharmaversesdtm")
data("ex", package = "pharmaversesdtm")
data("ds", package = "pharmaversesdtm")

ADSL is typically built from the DM dataset, removing unnecessary columns and adding treatment variables in one step:

adsl <- dm %>%
  select(-DOMAIN) %>%
  mutate(
    TRT01P = ARM,
    TRT01A = ACTARM
  )

Step 2: Derive Treatment Variables

Using {admiral}, we extract and standardize treatment dates from the EX dataset:

ex_ext <- ex %>%
  filter(!is.na(USUBJID)) %>%
  derive_vars_dt(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dt(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN"
  )

Then merge these dates into ADSL:

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXSTDT),
    new_vars = exprs(TRTSDT = EXSTDT),
    order = exprs(EXSTDT, EXSEQ),
    mode = "first",
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDT),
    new_vars = exprs(TRTEDT = EXENDT),
    order = exprs(EXENDT, EXSEQ),
    mode = "last",
    by_vars = exprs(STUDYID, USUBJID)
  )

Step 3: Derive End of Study (EOS) Status

The disposition dataset (DS) is used to determine when a patient exited the study:

ds_ext <- ds %>%
  filter(!is.na(DSSTDTC)) %>%
  derive_vars_dt(
    dtc = DSSTDTC,
    new_vars_prefix = "DSST"
  )

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ds_ext,
    by_vars = exprs(STUDYID, USUBJID),
    new_vars = exprs(EOSDT = DSSTDT),
    filter_add = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE"
  )

Step 4: Assign Population Flags

For safety population (SAFFL), we check if the patient received a treatment dose:

adsl <- adsl %>%
  derive_var_merged_exist_flag(
    dataset_add = ex,
    by_vars = exprs(STUDYID, USUBJID),
    new_var = SAFFL,
    condition = EXDOSE > 0 | str_detect(EXTRT, "PLACEBO")
  )

Step 5: Generate and Save Results

Finally, we save the dataset CSV and can view some of its columns:

# Save to a CSV file
write.csv(adsl, "adsl_output.csv", row.names = FALSE)

adsl
USUBJID TRT01P TRT01A TRTSDT TRTEDT SAFFL
01-701-1015 Placebo Placebo 2014-01-02 2014-07-02 Y
01-701-1023 Placebo Placebo 2012-08-05 2012-09-01 Y
01-701-1028 Xanomeline High Dose Xanomeline High Dose 2013-07-19 2014-01-14 Y
01-701-1033 Xanomeline Low Dose Xanomeline Low Dose 2014-03-18 2014-03-31 Y
01-701-1034 Xanomeline High Dose Xanomeline High Dose 2014-07-01 2014-12-30 Y
01-701-1047 Placebo Placebo 2013-02-12 2013-03-09 Y
01-701-1057 Screen Failure Screen Failure NA NA NA
01-701-1097 Xanomeline Low Dose Xanomeline Low Dose 2014-01-01 2014-07-09 Y
01-701-1111 Xanomeline Low Dose Xanomeline Low Dose 2012-09-07 2012-09-16 Y
01-701-1115 Xanomeline Low Dose Xanomeline Low Dose 2012-11-30 2013-01-23 Y

More Details on ADSL Creation

This is just a high-level example; the full process includes deriving death variables, grouping populations, and applying labels. For a deeper dive, check out the ADSL Implementation Guide.

Who Are the Key Players in Pharmaverse, and Do You Need to Use All Packages?

Key Players in pharmaverse

  • Pharmaverse Council and Community – A collaborative group of developers, industry experts, and contributors maintaining and expanding the ecosystem.
  • Open-Source Contributors – Individuals and organizations developing and refining pharmaverse packages.
  • Pharmaverse is part of PHUSE – PHUSE plays an active role in supporting and advancing the pharmaverse initiative.
  • The pharmaverse community collaborates with organizations like the FDA, EMA, R Consortium, and CDISC to align with industry standards and best practices for clinical data reporting.

Do You Need to Use All Pharmaverse Packages?

  • No, organizations can select only the packages that fit their needs.

  • Many packages are modular and independent, allowing selective integration.

  • Pharmaverse hosts multiple packages with similar aims, giving users the flexibility to choose what works best for them rather than prescribing a single approach.

  • Pharmaverse complements Tidyverse, allowing organizations to continue using familiar R workflows.

How Pharmaverse Differs from Tidyverse & How to Learn It Effectively

Differences Between pharmaverse and Tidyverse

  • Tidyverse provides general-purpose data science tools such as data wrangling and visualization…

  • … Whereas pharmaverse integrates Tidyverse functions but adds compliance, validation, and reporting features for pharma-specific clinical data structuring, reporting and regulatory submissions.

Getting Started with the Pharmaverse

Pharmaverse provides an open-source ecosystem for clinical reporting, extending Tidyverse with validation, compliance, and regulatory submission capabilities. By following a structured approach from raw data to ADaMs, organizations can enhance efficiency while maintaining data integrity.

Resources

Reuse

Citation

BibTeX citation:
@online{kenneth2025,
  author = {Kenneth, Gift and Gupta, Sunil and , APPSILON},
  title = {Working with {Clinical} {Trial} {Data?} {There’s} a
    {Pharmaverse} {Package} for {That}},
  date = {2025-02-28},
  url = {https://pharmaverse.github.io/blog/posts/2025-02-28_theres_a_pharmaverse_package_for_that/managing-clinical-trial-data.html},
  langid = {en}
}
For attribution, please cite this work as:
Kenneth, Gift, Sunil Gupta, and APPSILON. 2025. “Working with Clinical Trial Data? There’s a Pharmaverse Package for That.” February 28, 2025. https://pharmaverse.github.io/blog/posts/2025-02-28_theres_a_pharmaverse_package_for_that/managing-clinical-trial-data.html.