SDTM Workshop using {sdtm.oak} - Condensed Version

Rammprasad Ganapathy (Roche/Genentench)

Presenters

Rammprasad Ganapathy (Ram)

  • Director, Data Science Acceleration, Data & Statistical Sciences at Roche/Genentech
  • Over 15 years of experience in EDC and Statistical Programming.
  • Based in San Francisco Bay area in CA.
  • Passionate about developing R packages, developing AI agents, vibe coding, creating innovative solutions, and automation.
  • Led the OAK team and the roak product from the initial vision of metadata driven automation through PoC to Production.
  • It is now utilized to automate SDTM in more than 100 studies at Roche for regulatory reporting.
  • Product owner and one of the developers for sdtm.oak.

Objectives

  • By the end of this workshop you will have:
    • Gained an understanding of how to use sdtm.oak.
    • Gained an understanding of how to use resources for building SDTM in R.
    • Demonstrate code generation using metadata.

Agenda

  • 🕙 20 minutes Introduction to sdtm.oak

  • 🕥 10 minutes demo of code generation app using claude and metadata

Introduction to sdtm.oak

About the package

  • Sponsored by CDISC COSA, pharmaceutical companies, including Roche, Pfizer, GSK, Vertex, Atorus Research, Pattern Institute, Transition Technologies Science.
  • Part of the pharmaverse Group of packages.
  • Inspired by the Roche’s roak package.
  • Addresses a critical gap in the pharmaverse suite by enabling study programmers to create SDTM datasets in R, complementing the existing capabilities for ADaM, TLGs, eSubmission, etc.

Challenges in SDTM Programming

  • Although SDTM is simpler with less complex derivations compared to ADaM, it presents unique challenges. Unlike ADaM, which uses SDTM datasets as its source with a well-defined structure, SDTM relies on raw datasets as input.
  • Raw Data structure - Different EDC systems produce data in different structures, different variable names, dataset names etc.
  • Varying Data Collection standards - Although CDASH is available, the companies can still develop varying eCRFs using CDASH standards.

sdtm.oak v0.2

  • v0.2 is avaiable on CRAN.
  • EDC agnostic sdtm.oak is designed to be highly versatile, accommodating varying raw data structures from different EDC systems and external vendors.
  • Data standards agnostic It supports both CDISC-defined data collection standards (CDASH) and various proprietary data collection standards defined by pharmaceutical companies.
  • Provides a framework for modular programming, making it a valuable addition to the pharmaverse ecosystem.

Algorithms

Key concepts

  • The SDTM mappings that transform the collected source data into the target SDTM data model are grouped into algorithms.
  • These mapping algorithms form the backbone of {sdtm.oak}
  • Algorithms can be re-used across multiple SDTM domains.
  • Programming language agnostic This concept does not rely on a specific programming language for implementation.
  • sdtm.oak has R functions to represent each algorithm

assign_no_ct

Algorithm or Function Description of the Algorithm Example SDTM mapping
sdtm.oak::assign_no_ct() One-to-one mapping between the raw source and a target SDTM variable that has no controlled terminology restrictions. Just a simple assignment statement. MH.MHTERM

AE.AETERM

assign_ct

Algorithm or Function Description of the Algorithm Example SDTM mapping
sdtm.oak::assign_ct() One-to-one mapping between the raw source and a target SDTM variable that is subject to controlled terminology restrictions. A simple assign statement and applying controlled terminology. This will be used only if the SDTM variable has an associated controlled terminology. VS.VSPOS

VS.VSLAT

assign_datetime

Algorithm or Function Description of the Algorithm Example SDTM mapping
sdtm.oak::assign_datetime() One-to-one mapping between the raw source and a target that involves mapping a Date or time or datetime component. This mapping algorithm also takes care of handling unknown dates and converting them into. ISO8601 format. MH.MHSTDTC

AE.AEENDTC

hardcode_ct

Algorithm or Function Description of the Algorithm Example SDTM mapping
sdtm.oak::hardcode_ct() Mapping a hardcoded value to a target SDTM variable that is subject to terminology restrictions. This will be used only if the SDTM variable has an associated controlled terminology. MH.MHPRESP = ‘Y’

VS.VSORRESU = ‘mmHg’

hardcode_no_ct

Algorithm or Function Description of the Algorithm Example SDTM mapping
sdtm.oak::hardcode_no_ct() Mapping a hardcoded value to a target SDTM variable that has no terminology restrictions. CM.CMTRT = ‘FLUIDS’

VS.VSCAT = ‘VITAL SIGNS’

condition_add

Algorithm or Function Description of the Algorithm Example SDTM mapping
sdtm.oak::condition_add() Algorithm that is used to filter the source data and/or target domain based on a condition. The mapping will be applied only if the condition is met. This algorithm has to be used in conjunction with other algorithms, that is if the condition is met perform the mapping using algorithms like assign_ct, assign_no_ct, hardcode_ct, hardcode_no_ct, assign_datetime. If MDPRIOR == 1 then CM.CMSTRTPT = ‘BEFORE’.

VS.VSMETHOD when VSTESTCD = ‘TEMP’

oak_cal_ref_dates

Algorithm or Function Description of the Algorithm Example SDTM mapping
sdtm.oak::oak_cal_ref_dates() Derivation of Reference dates in the DM domain DM.RFSTDTC

DM.RFPENDTC

Reusable Algorithms

Algorithms compared to dplyr

  • sdtm.oak algorithms enhances dplyr functions
    • Allowing users to perform multiple actions within a single function call.
    • Applying if_else condtions, Controlled Terminology in a single function call by providing a simple approach compared to case_when statements.
    • Mapping an SDTM variable only if the source contains data, which is particularly useful when hardcoding.
    • Handling unknown dates, as well as date and time collected in separate or the same raw variables.
    • Adding qualifiers to topic variables using oak id variables.
  • While all these can be achieved using dplyr, the algorithms in sdtm.oak provide a more elegant and efficient approach.

sdtm.oak Programming

Programming concepts

  • Is very close to the key SDTM concepts.
  • Provide a straightforward way to do step-by-step SDTM programming in R, that is, mapping topic variable and its qualifiers.
  • Programming steps are generic across SDTM domain classes like Events, Interventions, Findings

SDTM Concept

sdtm concept

Programming steps

  • Read Raw datasets
  • Create id vars in the raw dataset
  • Read study controlled terminology
  • Map Topic Variable
  • Map Rest of the variables
  • Repeat Map topic and Map rest for every topic variable
  • Create SDTM derived variables
  • Add Lables and Attributes

oak id vars

  • Raw data can be in long format, where each piece of collected data is represented as a column.
  • In SDTM mappings, transposing may be necessary to create multiple records from a single row in a raw dataset (e.g., HEIGHT and WEIGHT in the VS domain).
  • Alternatively, a single row in an SDTM domain can be created from one row of the raw dataset (e.g., AETERM from the adverse events raw dataset).
  • Qualifiers need to be mapped to their corresponding topic variables.
  • The OAK ID variables are a combination of patient number, row number of the raw dataset, and raw source name.
  • These id variables provide key linkage between the SDTM datasets and the raw datasets during programming.

Code Generation Demo

Demo - Create VS domain

Review specs

Review aCRF

Demo - Code Generation using metadata

Run the app and explain to the users

Link to the app

Review a record Vitals Raw dataset

  • PATNUM: 701-1034
  • OAK_ID: 213
  • INSTANCE: WEEK2
  • VTLD: 15-Jul-2014
  • TMPTC: after Lying Down for 5 Minutes
  • SYS_BP: 183.0
  • DIA_BP: 81.0

Review a record in VS

  • USUBJID: 01-701-1034
  • VSTESTCD: SYSBP and DIABP
  • VISIT: WEEK2
  • VSDTC: 2014-07-15
  • VSTPT: AFTER LYING DOWN FOR 5 MINUTES
  • VSORRES: 183.0 and 81.0

Quiz - 1

When to use hardcode mapping algorithm?

    1. To assign a collected value on the eCRF
    1. To Hardcode a SDTM variable that has not directly collected on the eCRF.

Quiz - 1 Answer

  • Correct answer:
  • b) To hardcode a value use hardcode algorithms. To assign a collected value use assign algorithms

Get Involved

Please try the package and provide us with your feedback, or get involved in the development of new features. We can be reached through any of the following means:

Slack: https://oakgarden.slack.com
GitHub: https://github.com/pharmaverse/sdtm.oak
CDISC Wiki: https://wiki.cdisc.org/display/oakgarden

Learning Resources

sdtm.oak Documentation

Pharmaverse Examples

sdtm.oak Youtube Video

sdtm.oak Pharmaverse Blog

SDTM Programming in R with sdtm.oak Poster

Thank you