library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(stringr)
# Load sample data
data("dm", package = "pharmaversesdtm")
data("ex", package = "pharmaversesdtm")
data("ds", package = "pharmaversesdtm")
Working with clinical trial data is no small task. It needs to be precise, compliant, and efficient. Traditionally, this meant using proprietary tools and working within siloed systems, which often made the process more complicated and expensive than necessary. But we think there’s a better way.
The pharmaverse is an open-source ecosystem of R packages built specifically for clinical trials. These tools integrate seamlessly with the Tidyverse, making data management more flexible, efficient, and transparent.
Whether you’re collecting, validating, analyzing, or preparing data for regulatory submission, there’s a pharmaverse package designed to support your workflow and help you work smarter.
This post covers:
Key stages of clinical trials and the R packages that support them
Creating ADSL datasets and essential programming steps
Key players in pharmaverse and whether you need all packages
How pharmaverse compares to Tidyverse and how to learn it
By the end, you’ll have a clear understanding of how pharmaverse supports clinical trial operations and how to apply these tools in your work.
Key Stages of Clinical Reporting
Managing clinical trial data involves multiple stages, each with its own challenges. Pharmaverse provides a range of R packages that support different parts of the process, sometimes even offering multiple options for the same task. This flexibility allows organizations to choose the best tools for their specific needs rather than sticking to a one-size-fits-all approach.
A metadata-driven approach helps ensure that clinical trial data is consistently structured and aligned with regulatory standards. The typical process follows this sequence:
Metadata ➝ OAK ➝ Admiral ➝ Define.xml ➝ TLGs ➝ Submissions
Some examples of pharmaverse packages that support clinical reporting include:
- {diffdf} – Tracking differences in datasets.
- {metatools} – Metadata management and transformation.
- {sdtm.oak} – The primary pharmaverse package for SDTM dataset creation.
- {datacutr} - Performing data cuts.
- {admiral} – Standardized data derivations.
- {metacore} – Metadata-driven structures.
- The pharmaverse provides multiple table-making packages, such as {chevron} (which builds on {rtables}), {Tplyr}, {pharmaRTF}, {gtsummary}, {cards}, {tfrmt}, and {tidytlg}. More tools are listed on the TLGs page.
- {xportr} – CDISC-compliant dataset export.
- {pkglite} – Package management and tracking.
- {metacore} and {metatools} – For standardized metadata structures and validation.
- {logrx} - For logging R scripts.
Pharmaverse packages are built on top of Tidyverse tools and integrate seamlessly with packages like {dplyr} for data manipulation and {ggplot2} for visualization.
Note: This post highlights some key pharmaverse packages relevant to clinical reporting. For a full and up-to-date list, visit the Pharmaverse website. If there’s a package we missed that should be included, let us know, and we’d be happy to update this post.
By using these tools, organizations can optimize their data pipeline, ensuring clinical data is well-structured and ready for regulatory submission with ease.
Example: Creating ADSL
Building an ADSL dataset involves several key steps, from reading in data to deriving treatment variables and population flags. While these steps apply regardless of the tools used, pharmaverse packages like {admiral} simplify the process with functions designed for CDISC-compliant datasets.
This example is based on the ADSL template, which provides a structured approach to creating an ADSL dataset.
Step 1: Read in Data
To begin, clinical trial datasets such as DM, EX, DS, AE, and LB are loaded. The {pharmaversesdtm} package provides sample CDISC SDTM datasets:
ADSL is typically built from the DM dataset, removing unnecessary columns and adding treatment variables in one step:
<- dm %>%
adsl select(-DOMAIN) %>%
mutate(
TRT01P = ARM,
TRT01A = ACTARM
)
Step 2: Derive Treatment Variables
Using {admiral}, we extract and standardize treatment dates from the EX dataset:
<- ex %>%
ex_ext filter(!is.na(USUBJID)) %>%
derive_vars_dt(
dtc = EXSTDTC,
new_vars_prefix = "EXST"
%>%
) derive_vars_dt(
dtc = EXENDTC,
new_vars_prefix = "EXEN"
)
Then merge these dates into ADSL:
<- adsl %>%
adsl derive_vars_merged(
dataset_add = ex_ext,
filter_add = (EXDOSE > 0 |
== 0 &
(EXDOSE str_detect(EXTRT, "PLACEBO"))) & !is.na(EXSTDT),
new_vars = exprs(TRTSDT = EXSTDT),
order = exprs(EXSTDT, EXSEQ),
mode = "first",
by_vars = exprs(STUDYID, USUBJID)
%>%
) derive_vars_merged(
dataset_add = ex_ext,
filter_add = (EXDOSE > 0 |
== 0 &
(EXDOSE str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDT),
new_vars = exprs(TRTEDT = EXENDT),
order = exprs(EXENDT, EXSEQ),
mode = "last",
by_vars = exprs(STUDYID, USUBJID)
)
Step 3: Derive End of Study (EOS) Status
The disposition dataset (DS) is used to determine when a patient exited the study:
<- ds %>%
ds_ext filter(!is.na(DSSTDTC)) %>%
derive_vars_dt(
dtc = DSSTDTC,
new_vars_prefix = "DSST"
)
<- adsl %>%
adsl derive_vars_merged(
dataset_add = ds_ext,
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(EOSDT = DSSTDT),
filter_add = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE"
)
Step 4: Assign Population Flags
For safety population (SAFFL
), we check if the patient received a treatment dose:
<- adsl %>%
adsl derive_var_merged_exist_flag(
dataset_add = ex,
by_vars = exprs(STUDYID, USUBJID),
new_var = SAFFL,
condition = EXDOSE > 0 | str_detect(EXTRT, "PLACEBO")
)
Step 5: Generate and Save Results
Finally, we save the dataset CSV and can view some of its columns:
# Save to a CSV file
write.csv(adsl, "adsl_output.csv", row.names = FALSE)
adsl
USUBJID | TRT01P | TRT01A | TRTSDT | TRTEDT | SAFFL |
---|---|---|---|---|---|
01-701-1015 | Placebo | Placebo | 2014-01-02 | 2014-07-02 | Y |
01-701-1023 | Placebo | Placebo | 2012-08-05 | 2012-09-01 | Y |
01-701-1028 | Xanomeline High Dose | Xanomeline High Dose | 2013-07-19 | 2014-01-14 | Y |
01-701-1033 | Xanomeline Low Dose | Xanomeline Low Dose | 2014-03-18 | 2014-03-31 | Y |
01-701-1034 | Xanomeline High Dose | Xanomeline High Dose | 2014-07-01 | 2014-12-30 | Y |
01-701-1047 | Placebo | Placebo | 2013-02-12 | 2013-03-09 | Y |
01-701-1057 | Screen Failure | Screen Failure | NA | NA | NA |
01-701-1097 | Xanomeline Low Dose | Xanomeline Low Dose | 2014-01-01 | 2014-07-09 | Y |
01-701-1111 | Xanomeline Low Dose | Xanomeline Low Dose | 2012-09-07 | 2012-09-16 | Y |
01-701-1115 | Xanomeline Low Dose | Xanomeline Low Dose | 2012-11-30 | 2013-01-23 | Y |
More Details on ADSL Creation
This is just a high-level example; the full process includes deriving death variables, grouping populations, and applying labels. For a deeper dive, check out the ADSL Implementation Guide.
Who Are the Key Players in Pharmaverse, and Do You Need to Use All Packages?
Key Players in pharmaverse
- Pharmaverse Council and Community – A collaborative group of developers, industry experts, and contributors maintaining and expanding the ecosystem.
- Open-Source Contributors – Individuals and organizations developing and refining pharmaverse packages.
- Pharmaverse is part of PHUSE – PHUSE plays an active role in supporting and advancing the pharmaverse initiative.
- The pharmaverse community collaborates with organizations like the FDA, EMA, R Consortium, and CDISC to align with industry standards and best practices for clinical data reporting.
Do You Need to Use All Pharmaverse Packages?
No, organizations can select only the packages that fit their needs.
Many packages are modular and independent, allowing selective integration.
Pharmaverse hosts multiple packages with similar aims, giving users the flexibility to choose what works best for them rather than prescribing a single approach.
Pharmaverse complements Tidyverse, allowing organizations to continue using familiar R workflows.
How Pharmaverse Differs from Tidyverse & How to Learn It Effectively
Differences Between pharmaverse and Tidyverse
Tidyverse provides general-purpose data science tools such as data wrangling and visualization…
… Whereas pharmaverse integrates Tidyverse functions but adds compliance, validation, and reporting features for pharma-specific clinical data structuring, reporting and regulatory submissions.
Getting Started with the Pharmaverse
Pharmaverse provides an open-source ecosystem for clinical reporting, extending Tidyverse with validation, compliance, and regulatory submission capabilities. By following a structured approach from raw data to ADaMs, organizations can enhance efficiency while maintaining data integrity.
You can start with Pharmaverse Examples – A curated set of documentation and tutorials.
Attend Pharma Industry Webinars and Conferences – Stay updated on new developments through events like PHUSE events and webinars, R/Pharma conferences and events, CDISC events, Shiny Gatherings x Pharmaverse webinars, etc.
Engage with the Open-Source Community – Contribute to package improvements or discussions. You can join the pharmaverse community to get started..
Explore packages on the pharmaverse website.
Try implementing an ADSL dataset using following the ADSL Implementation Guide.
Refer to this grid for guidance on using Tidyverse or pharmaverse to complete tasks in the submission process.
Resources
This blog post was based on this presentation by Sunil Gupta: R and pharmaverse: The New Frontier for Today’s Statistical Programmers
Reuse
Citation
@online{kenneth2025,
author = {Kenneth, Gift and Gupta, Sunil and , APPSILON},
title = {Working with {Clinical} {Trial} {Data?} {There’s} a
{Pharmaverse} {Package} for {That}},
date = {2025-02-28},
url = {https://pharmaverse.github.io/blog/posts/2025-02-28_theres_a_pharmaverse_package_for_that/managing-clinical-trial-data.html},
langid = {en}
}