Sponsored by CDISC COSA, pharmaceutical companies, including Roche, Pfizer, GSK, Vertex, Atorus Research, Pattern Institute, Transition Technologies Science.
Part of the pharmaverse Group of packages.
Inspired by the Roche’s roak package.
Addresses a critical gap in the pharmaverse suite by enabling study programmers to create SDTM datasets in R, complementing the existing capabilities for ADaM, TLGs, eSubmission, etc.
Challenges in SDTM Programming
Although SDTM is simpler with less complex derivations compared to ADaM, it presents unique challenges. Unlike ADaM, which uses SDTM datasets as its source with a well-defined structure, SDTM relies on raw datasets as input.
Raw Data structure - Different EDC systems produce data in different structures, different variable names, dataset names etc.
Varying Data Collection standards - Although CDASH is available, the companies can still develop varying eCRFs using CDASH standards.
sdtm.oak v0.2
v0.2 is avaiable on CRAN.
EDC agnosticsdtm.oak is designed to be highly versatile, accommodating varying raw data structures from different EDC systems and external vendors.
Data standards agnostic It supports both CDISC-defined data collection standards (CDASH) and various proprietary data collection standards defined by pharmaceutical companies.
Provides a framework for modular programming, making it a valuable addition to the pharmaverse ecosystem.
Algorithms
Key concepts
The SDTM mappings that transform the collected source data into the target SDTM data model are grouped into algorithms.
These mapping algorithms form the backbone of SDTM Data Transformation Engine • sdtm.oak sdtm.oak
Algorithms can be re-used across multiple SDTM domains.
Programming language agnostic This concept does not rely on a specific programming language for implementation.
sdtm.oak has R functions to represent each algorithm
One-to-one mapping between the raw source and a target SDTM variable that has no controlled terminology restrictions. Just a simple assignment statement.
One-to-one mapping between the raw source and a target SDTM variable that is subject to controlled terminology restrictions. A simple assign statement and applying controlled terminology. This will be used only if the SDTM variable has an associated controlled terminology.
One-to-one mapping between the raw source and a target that involves mapping a Date or time or datetime component. This mapping algorithm also takes care of handling unknown dates and converting them into. ISO8601 format.
Mapping a hardcoded value to a target SDTM variable that is subject to terminology restrictions. This will be used only if the SDTM variable has an associated controlled terminology.
Algorithm that is used to filter the source data and/or target domain based on a condition. The mapping will be applied only if the condition is met. This algorithm has to be used in conjunction with other algorithms, that is if the condition is met perform the mapping using algorithms like assign_ct, assign_no_ct, hardcode_ct, hardcode_no_ct, assign_datetime.
Allowing users to perform multiple actions within a single function call.
Applying if_else condtions, Controlled Terminology in a single function call by providing a simple approach compared to case_when statements.
Mapping an SDTM variable only if the source contains data, which is particularly useful when hardcoding.
Handling unknown dates, as well as date and time collected in separate or the same raw variables.
Adding qualifiers to topic variables using oak id variables.
While all these can be achieved using dplyr, the algorithms in sdtm.oak provide a more elegant and efficient approach.
sdtm.oak Programming
Programming concepts
Is very close to the key SDTM concepts.
Provide a straightforward way to do step-by-step SDTM programming in R, that is, mapping topic variable and its qualifiers.
Programming steps are generic across SDTM domain classes like Events, Interventions, Findings
SDTM Concept
sdtm concept
Programming steps
Read Raw datasets
Create id vars in the raw dataset
Read study controlled terminology
Map Topic Variable
Map Rest of the variables
Repeat Map topic and Map rest for every topic variable
Create SDTM derived variables
Add Lables and Attributes
oak id vars
Raw data can be in long format, where each piece of collected data is represented as a column.
In SDTM mappings, transposing may be necessary to create multiple records from a single row in a raw dataset (e.g., HEIGHT and WEIGHT in the VS domain).
Alternatively, a single row in an SDTM domain can be created from one row of the raw dataset (e.g., AETERM from the adverse events raw dataset).
Qualifiers need to be mapped to their corresponding topic variables.
The OAK ID variables are a combination of patient number, row number of the raw dataset, and raw source name.
These id variables provide key linkage between the SDTM datasets and the raw datasets during programming.
Workshop - Create CM domain
How we will “code” today
I will walk you through coding cm, vs, dm, ae domains
Discussion on each function and function arguments
Occasional Check-in Poll
Important to move along quickly
Please post questions to chat
Full scripts are available in scripts folder
Review specs
Review aCRF
Code Walkthrough
Run the code and explain to the users
Recap
Did we review the following in the code?
Function call for various Algorithms
assign datetime function
How does condition_add work?
SDTM derived variables
Quiz - 1
What function should be used for mapping for CMROUTE
Derive an SDTM variable — assign_no_ct • sdtm.oak assign_no_ct()
Derive an SDTM variable — assign_no_ct • sdtm.oak assign_ct()
Quiz - 1 Answer
Correct answer:
b) As CMROUTE has a codelist associated we need to use Derive an SDTM variable — assign_no_ct • sdtm.oak assign_ct()
Users can create the majority of the SDTM domains.
Yet to be developed Domains
Trial Design Domains
SV (Subject Visits)
SE (Subject Elements)
RELREC (Related Records)
Associated Person domains.
Check-in Exercise
Open exercises/sdtm_exercises.R
library(sdtm.oak)library(pharmaverseraw)library(dplyr)#AE aCRF - https://github.com/pharmaverse/pharmaverseraw/blob/main/vignettes/articles/aCRFs/AdverseEvent_aCRF.pdf# Exercise 1: Map AETERM from raw_var=IT.AETERM, tgt_var=AETERM# Exercise 2: Map AESER from raw_var=IT.AESER, tgt_var=AESER. Codelist code for AESDTH is C66742# Exercise 3: Map AESDTH from raw_var=IT.AESDTH, tgt_var=AESDTH.Annotation text is # If "Yes" then AESDTH = "Y" else Not Submitted. Codelist code for AESDTH is C66742
Please try the package and provide us with your feedback, or get involved in the development of new features. We can be reached through any of the following means: