Raw data for domains in the pharmaversesdtm package

Background

The {pharmaversesdtm} and {pharmaverseadam} packages have been available for some time, providing reusable examples for SDTM and ADaM datasets, respectively. However, one critical piece of the workflow was missing: the raw datasets that serve as the starting point for these examples.

Why now?

With the recent release of the {sdtm.oak} package — an open-source package that enables SDTM programming in R — we now have an opportunity to complete the picture. The new {pharmaverseraw} package fills this gap by providing example raw datasets that can be used as input for {pharmaversesdtm} datasets generation with {sdtm.oak} .

What is in

{pharmaverseraw} ?

The {pharmaverseraw} package v0.1.0 is out on CRAN. It is also available from the pharmaverse site at: https://pharmaverse.org/e2eclinical/developers/. It includes raw datasets for the following SDTM domains:

- AE: Adverse Events

- DS: Subject Disposition

- DM: Demographics

- EC/EX: Exposure

These raw datasets in {pharmaverseraw} package are intentionally designed to be:

EDC agnostic: They are not tied to any specific Electronic Data Capture (EDC) system like Rave or Veeva.
Standards agnostic: Some variables follow CDASH (Clinical Data Acquisition Standards Harmonization), while others do not. This reflects real-world data standards variability across companies.

The annotated case report forms corresponding to the raw datasets are also present in the inst\acrf folder. These PDF files illustrate how each raw variable aligns with SDTM expectations, offering insight into the mapping logic used in {sdtm.oak} .

How are these datasets created?

The datasets in {pharmaverseraw} were created through reverse engineering - we started with the finalized SDTM datasets in {pharmaversesdtm} and worked backward to construct plausible raw datasets that could reasonably result in those SDTM outputs. This approach ensures data consistency while allowing us to demonstrate the flexibility of {sdtm.oak} in handling raw data in different formats.

From raw to SDTM

Using {sdtm.oak} , you can take raw datasets from {pharmaverseraw} and apply SDTM mapping functions to map the target SDTM variables. There will be new SDTM examples published using this data later at: https://pharmaverse.github.io/examples/sdtm/examples.html

Below is an example snippet that shows how to use a raw AE dataset from {pharmaverseraw} and generate SDTM AE variables with {sdtm.oak} :

library(pharmaverseraw)
library(sdtm.oak)
library(dplyr)

# Read in raw data
ae_raw <- pharmaverseraw::ae_raw

# Derive oak_id_vars
ae_raw <- ae_raw %>%
  generate_oak_id_vars(
    pat_var = "PATNUM",
    raw_src = "ae_raw"
  )

# Map AETERM and AESDTH variables for AE domain
ae <-
  # Derive topic variable
  # Map AETERM using assign_no_ct, raw_var=IT.AETERM, tgt_var=AETERM
  assign_no_ct(
    raw_dat = ae_raw,
    raw_var = "IT.AETERM",
    tgt_var = "AETERM",
    id_vars = oak_id_vars()
  ) %>%
  # Map AESDTH using hardcode_no_ct and condition_add, raw_var=IT.AESDTH, tgt_var=AESDTH
  # If Yes then AESDTH = Y else Not submitted
  hardcode_no_ct(
    raw_dat = condition_add(ae_raw, IT.AESDTH == "Yes"),
    raw_var = "IT.AESDTH",
    tgt_var = "AESDTH",
    tgt_val = "Y",
    id_vars = oak_id_vars()
  ) %>%
  hardcode_no_ct(
    raw_dat = condition_add(ae_raw, IT.AESDTH != "Yes"),
    raw_var = "IT.AESDTH",
    tgt_var = "AESDTH",
    tgt_val = "Not Submitted",
    id_vars = oak_id_vars()
  )

Get involved

Similar to the other tools under pharmaverse umbrella, the {pharmaverseraw} package is open-source and community-driven. We welcome volunteers who are interested in contributing to the continued development and improvement of this package.

If you’d like to get involved, here are some ways you can help:

- Add new raw datasets: Take other SDTM domains from the {pharmaversesdtm} package and create corresponding raw datasets using R. This will help expand the coverage of the package.

- Create mock aCRFs for the raw datasets: Develop annotated case report forms (aCRFs) that illustrate how the raw variables are mapped to align with SDTM standards.

- Prepare documentation: For each raw dataset you create, include documentation that explains the data structure, variable definitions, and any relevant notes.

Whether you’re a programmer, CDISC expert, or clinical data manager, your contributions can make a meaningful impact. Visit the github repository to open up a issue or start a discussion.

Last updated

2025-08-13 16:59:15.550156

Details

Source, Session info

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{chen2025,
  author = {Chen, Shiyu},
  title = {Raw Data for Domains in the Pharmaversesdtm Package},
  date = {2025-07-07},
  url = {https://pharmaverse.github.io/blog/posts/2025-07-07_pharmaverse.../pharmaverseraw__package.html},
  langid = {en}
}

For attribution, please cite this work as:

Chen, Shiyu. 2025. “Raw Data for Domains in the Pharmaversesdtm Package.” July 7, 2025. https://pharmaverse.github.io/blog/posts/2025-07-07_pharmaverse.../pharmaverseraw__package.html.