pharmaverse examples
  1. eSubmission
  2. eSubmission
  • Introduction

  • SDTM
    • SDTM Examples
  • ADaM
    • ADSL
    • ADPC
    • ADPPK
    • ADRS
    • ADTTE
    • ADVS
    • ADAE
  • TLG
    • Demographic Table
    • Adverse Events
    • Pharmacokinetic
  • Interactive
    • teal applications
  • Logs
    • The Difference Between logr, logrx, and whirl
  • eSubmission
    • eSubmission

  • Session Info
  • Pharmaverse Home
  1. eSubmission
  2. eSubmission

eSubmission

Introduction

This article shares learnings from R and open source based submission experiences, and how these can be achieved using pharmaverse packages in conjunction.

When considering such a submission, it is important to discuss this early on with the regulatory agency during any pre-submission correspondence.

The main pharmaverse packages specifically supporting eSubmission are as follows:

  • {xportr}: delivers the SAS transport file (XPT) and eSub checks.
  • {pkglite}: enables exchange of closed source R packages via text files.
  • {datasetjson}: experimental package to deliver Dataset-JSON.

Note that a python equivalent of {pkglite} also exists, read more at pharmaverse python.

Of course, most of the pharmaverse packages contribute to eSubmission in some way, through their focus on enabling clinical reporting and analyses deliverables that form the backbone of any submission. There are many other examples on this site dedicated to these, so we won’t repeat here. One package we did want to call out here is {aNCA} which performs Non-Compartmental Analysis, as clinical pharmacology analyses are another important part of submissions.

R Submissions Working Group

Within the R Consortium, the R Submissions WG have conducted several pilots to provide open examples of submitting R-based clinical trial data/analysis packages to the FDA.

Anyone new to R-based submissions should definitely check out the open materials from this team here.

Since these pilots laid the foundation, several major pharma companies have submitted primarily using R to health authorities across the world, so this article includes a mix of learnings from the pilots and real submissions.

Submission Contents

The areas of eSubmission that pharmaverse packages are most useful for would be data, readable code, and documentation.

Each of the sections below talk in more detail of these areas, plus specific mention of validation which is a common topic raised around usage of open source.

Data

The main 2 forms of dataset for CDISC clinical trial submission are SDTM and ADaM. Refer to the separate example articles on this site for how best to go about producing these.

For eSubmission the most common transport file format used to deliver these to health authorities is as xpt files, and for this we use {xportr}. This package works well with {metacore} harmonized specification objects as shown in the ADaM articles on this site.

Here is an example call using a synthetic ADSL ADaM from {pharmaverseadam} that shows how to produce xpt files, as well as how certain {xportr} functions can be used to take attributes directly from specifications so as to ensure specification to dataset consistency. This helps to ensure eSubmission-readiness. Additionally the package includes a number of built-in CDISC conformance checks as detailed on the site.

library(metacore)
library(xportr)
library(pharmaverseadam)
library(dplyr)

# Read in metacore object
metacore <- spec_to_metacore(
  path = "./metadata/safety_specs.xlsx",
  # All datasets are described in the same sheet
  where_sep_sheet = FALSE
) %>%
  select_dataset("ADSL")

dir <- tempdir() # Specify the directory for saving the XPT file

adsl <- pharmaverseadam::adsl %>%
  # Coerce variable type to match specification
  xportr_type(metacore) %>%
  # Assigns variable label from metacore specifications
  xportr_label(metacore) %>%
  # Assigns dataset label from metacore specifications
  xportr_df_label(metacore) %>%
  # Assigns SAS length from a variable level metadata
  xportr_length(metacore) %>%
  # Assigns variable format from metacore specifications
  xportr_format(metadata = metacore) %>%
  # Write xpt v5 transport file
  xportr_write(file.path(dir, "adsl.xpt"), metadata = metacore, domain = "ADSL")
# We can examine that the attributes have been correctly applied, for example
# as follows for the dataset label
attr(adsl, "label")
[1] "Subject-Level Analysis Dataset"

Now here’s an example where the checks would help to identify future possible eSubmission challenges. Note the console messages explaining the issues.

adsl_challenges <- pharmaverseadam::adsl %>%
  # Make a numeric variable as character, which conflicts with specifications
  mutate(AGE = as.character(AGE)) %>%
  # Add a variable that is name >8 characters which is not allowed for xpt v5
  mutate(REGIONCAT = REGION1)

adsl_challenges <- adsl_challenges %>%
  # Coerce variable type to match specification
  # This time we use the verbose argument to add a warning to the console
  xportr_type(metacore, verbose = "warn") %>%
  # Write xpt v5 transport file
  xportr_write(file.path(dir, "adsl.xpt"), metadata = metacore, domain = "ADSL")
── Variable type mismatches found. ──
✔ 1 variable coerced
Warning: Variable type(s) in dataframe don't match metadata: `AGE`
- `AGE` was coerced to <numeric>. (type in data: character, type in metadata: integer)
i Types in metadata considered as character (xportr.character_metadata_types option): 'character', 'char', 'text', 'date', 'posixct', 'posixt', 'datetime', 'time', 'partialdate', 'partialtime', 'partialdatetime', 'incompletedatetime', 'durationdatetime', and 'intervaldatetime'
i Types in metadata considered as numeric (xportr.numeric_metadata_types option): 'integer', 'numeric', 'num', and 'float'
i Types in data considered as character (xportr.character_types option): 'character'
i Types in data considered as numeric (xportr.numeric_types option): 'integer', 'float', 'numeric', 'posixct', 'posixt', 'time', 'date', and 'hms'
Warning: The following validation checks failed:
• Variable `REGIONCAT` must be 8 characters or less.

Furthermore, the package functions also include certain arguments that you can use to help shape to your specific submission needs.

The below shows how some specific FDA requirements can be achieved using the length_source argument from xportr::xportr_length() and the max_size_gb argument from xportr::xportr_write(), as detailed in the code comments. In this case, the dataset splits would be very unlikely to occur in practice for ADSL but would be much more likely on a BDS dataset such as ADLB for a large study.

adsl_example <- pharmaverseadam::adsl %>%
  # Coerce variable type to match specification
  xportr_type(metacore) %>%
  # Assigns variable label from metacore specifications
  xportr_label(metacore) %>%
  # Assigns dataset label from metacore specifications
  xportr_df_label(metacore) %>%
  # Assigns length from the maximum length of any value of the variable as per FDA data minimisation guidance
  xportr_length(metadata = metacore, length_source = "data") %>%
  # Assigns variable format from metacore specifications
  xportr_format(metadata = metacore) %>%
  # Write xpt v5 transport file but split into smaller subsets if greater than 5GB
  # according to the FDA cutoff size
  xportr_write(file.path(dir, "adsl.xpt"), metadata = metacore, domain = "ADSL", max_size_gb = 5)
── Variable labels missing from metadata. ──
✔ 13 labels skipped
── Variable length is shorter than the length specified in the metadata. ──
── Variable lengths missing from metadata. ──
✔ 13 lengths resolved `RFSTDTC`, `RFENDTC`, `RFXSTDTC`, `RFXENDTC`, `RFICDTC`, `RFPENDTC`, `DMDTC`, `DMDY`, `LSTALVDT`, `LDDTHGR1`, `DTH30FL`, `DTHA30FL`, and `DTHB30FL`
Data frame exported to 1 xpt files.

As an alternative to xpt, we also have {datasetjson} which enables the emerging new submission data exchange standard Dataset-JSON detailed here. As this package is still experimental, we don’t include code examples here but over time and as momentum grows we can add extra examples to this article. See R Submissions WG pilot 5 for a pilot submission using Dataset-JSON.

Readable Code

When using internal company-specific closed source tools and codebases, we often face challenges around providing readable code to health authorities as internal single company solutions are by nature not easily accessible and understandable. This is one of the benefits of using open source codebases in the submission setting as it enables review teams to look deeper into the functions used, plus they have the same open solutions available to use themselves as part of their review. As more of the industry embraces open source, we may even see more harmonization of submission packages across the industry as the same packages are re-used, and this should lead to greater familiarity from review teams easing their critical submission dossier review.

The nature of R packages and functions, as well as influence from {tidyverse}, has encouraged most pharmaverse package development teams to embrace modular strategy to their code. So when you use packages like {admiral}, you’ll see that the ADaM templates have been built with readability in mind specifically to aid with eSubmission.

Documentation

When using R for submission there are some important recommendations around documentation you should include as part of the eSubmission package. Most companies include these in the Reviewer’s Guide documents (the PHUSE Advance Hub offers example templates for these), but you might also choose to use a programTOC document.

Here’s the information we recommend including:

  • R version used
  • Descriptions of the main open source R package used, the version of each that you used, and where these are available from (especially if anywhere other than CRAN)
  • Descriptions of any closed source packages used (see details below)
  • Basic installation instructions of the R version used and how to install the package versions that you used (e.g. from CRAN). The R installation instructions should cover installation from Windows.

Here is an example extract of what the documentation of the open source packages used might look like:

Example Reviewer’s Guide Table

For closed source packages, if you plan to submit the code then there are 2 main options - {pkglite} to create txt files, or zip files (as were successfully used in later R Submission WG pilots).

Here is an example taken from the R Submission WG pilot 1 around how they used pkglite::pack() to provide packages as txt files. On the other side pkglite::unpack() could then be used to unpack.

library(pkglite)

# Using pkglite to pack proprietary R package and saved into ectd/r0pkg.txt
path$home %>%
  collate(file_ectd(), file_auto("inst")) %>%
  prune("R/zzz.R") %>%
  pack(output = "ectd/r0pkg.txt")

You can also check the following example Reviewer’s Guides as used for these pilots:

  • Pilot 1, using {pkglite}
  • Pilot 3, using compressed .zip approach

For further Reviewer’s Guide guidance, regardless of the programming language used for regulatory submissions, refer to the completion guidelines found at this PHUSE ADRG Package WG. As we progress on this open source submission journey, it is worth monitoring this PHUSE Open-Source Metadata Documentation WG to see how the guidance continues to evolve in this space.

Validation

Last but not least, through the submission process you might be asked to provide evidence of why you consider the open source code used to be accurate and reliable.

This of course is a huge topic and one where you should first refer to guidance from the R Validation Hub.

Pharmaverse also offers a selection of recommended packages that can help with this via our Developers page under Package Validation. Under CI/CD it also offers thevalidatoR which is a GitHub action that can be used to produce standardised validation reports, such as this example from an earlier version of {admiral}.

The Difference Between logr, logrx, and whirl
Source Code
---
title: "eSubmission"
---

## Introduction

This article shares learnings from R and open source based submission experiences,
and how these can be achieved using pharmaverse packages in conjunction.

When considering such a submission, it is important to discuss this early on
with the regulatory agency during any pre-submission correspondence.

The main pharmaverse packages specifically supporting eSubmission are as follows: 

-   [`{xportr}`](https://atorus-research.github.io/xportr/): delivers the SAS transport file (XPT) and eSub checks.
-   [`{pkglite}`](https://merck.github.io/pkglite/): enables exchange of closed source R packages via text files.
-   [`{datasetjson}`](https://atorus-research.github.io/datasetjson/): experimental package to deliver Dataset-JSON.

Note that a python equivalent of `{pkglite}` also exists, read more at [pharmaverse python](https://pharmaverse.org/e2eclinical/python/).

Of course, most of the pharmaverse packages contribute to eSubmission in some way,
through their focus on enabling clinical reporting and analyses deliverables that
form the backbone of any submission. There are many other examples on this site
dedicated to these, so we won't repeat here. One package we did want to call out
here is [`{aNCA}`](https://pharmaverse.github.io/aNCA/) which performs
Non-Compartmental Analysis, as clinical pharmacology analyses are another
important part of submissions.

## R Submissions Working Group

Within the R Consortium, the R Submissions WG have conducted several pilots to
provide open examples of submitting R-based clinical trial data/analysis packages
to the FDA.

Anyone new to R-based submissions should definitely check out the open materials
from this team [here](https://rconsortium.github.io/submissions-wg/).

Since these pilots laid the foundation, several major pharma companies have
submitted primarily using R to health authorities across the world, so this
article includes a mix of learnings from the pilots and real submissions.

## Submission Contents

The areas of eSubmission that pharmaverse packages are most useful for would be
data, readable code, and documentation.

Each of the sections below talk in more detail of these areas, plus specific
mention of validation which is a common topic raised around usage of open
source.

### Data

The main 2 forms of dataset for CDISC clinical trial submission are SDTM and
ADaM. Refer to the separate example articles on this site for how best to go
about producing these.

For eSubmission the most common transport file format used to deliver these to
health authorities is as xpt files, and for this we use `{xportr}`. This package
works well with `{metacore}` harmonized specification objects as shown in the
ADaM articles on this site.

Here is an example call using a synthetic `ADSL` ADaM from `{pharmaverseadam}`
that shows how to produce xpt files, as well as how certain `{xportr}`
functions can be used to take attributes directly from specifications so as
to ensure specification to dataset consistency. This helps to ensure
eSubmission-readiness. Additionally the package includes a number of built-in
CDISC conformance checks as detailed on the
[site](https://atorus-research.github.io/xportr/).

```{r clean, message=FALSE, warning=FALSE, results='hold'}
library(metacore)
library(xportr)
library(pharmaverseadam)
library(dplyr)

# Read in metacore object
metacore <- spec_to_metacore(
  path = "./metadata/safety_specs.xlsx",
  # All datasets are described in the same sheet
  where_sep_sheet = FALSE
) %>%
  select_dataset("ADSL")

dir <- tempdir() # Specify the directory for saving the XPT file

adsl <- pharmaverseadam::adsl %>%
  # Coerce variable type to match specification
  xportr_type(metacore) %>%
  # Assigns variable label from metacore specifications
  xportr_label(metacore) %>%
  # Assigns dataset label from metacore specifications
  xportr_df_label(metacore) %>%
  # Assigns SAS length from a variable level metadata
  xportr_length(metacore) %>%
  # Assigns variable format from metacore specifications
  xportr_format(metadata = metacore) %>%
  # Write xpt v5 transport file
  xportr_write(file.path(dir, "adsl.xpt"), metadata = metacore, domain = "ADSL")
```

```{r label}
# We can examine that the attributes have been correctly applied, for example
# as follows for the dataset label
attr(adsl, "label")
```

Now here's an example where the checks would help to identify future possible
eSubmission challenges. Note the console messages explaining the issues.

```{r challenges}
adsl_challenges <- pharmaverseadam::adsl %>%
  # Make a numeric variable as character, which conflicts with specifications
  mutate(AGE = as.character(AGE)) %>%
  # Add a variable that is name >8 characters which is not allowed for xpt v5
  mutate(REGIONCAT = REGION1)

adsl_challenges <- adsl_challenges %>%
  # Coerce variable type to match specification
  # This time we use the verbose argument to add a warning to the console
  xportr_type(metacore, verbose = "warn") %>%
  # Write xpt v5 transport file
  xportr_write(file.path(dir, "adsl.xpt"), metadata = metacore, domain = "ADSL")
```

Furthermore, the package functions also include certain arguments that you can
use to help shape to your specific submission needs.

The below shows how some specific FDA requirements can be achieved using
the `length_source` argument from `xportr::xportr_length()` and the
`max_size_gb` argument from `xportr::xportr_write()`, as detailed in the code comments.
In this case, the dataset splits would be very unlikely to occur in practice for
`ADSL` but would be much more likely on a BDS dataset such as `ADLB` for a
large study.

```{r examples}
adsl_example <- pharmaverseadam::adsl %>%
  # Coerce variable type to match specification
  xportr_type(metacore) %>%
  # Assigns variable label from metacore specifications
  xportr_label(metacore) %>%
  # Assigns dataset label from metacore specifications
  xportr_df_label(metacore) %>%
  # Assigns length from the maximum length of any value of the variable as per FDA data minimisation guidance
  xportr_length(metadata = metacore, length_source = "data") %>%
  # Assigns variable format from metacore specifications
  xportr_format(metadata = metacore) %>%
  # Write xpt v5 transport file but split into smaller subsets if greater than 5GB
  # according to the FDA cutoff size
  xportr_write(file.path(dir, "adsl.xpt"), metadata = metacore, domain = "ADSL", max_size_gb = 5)
```

As an alternative to xpt, we also have `{datasetjson}` which enables the
emerging new submission data exchange standard Dataset-JSON detailed
[here](https://www.cdisc.org/standards/data-exchange/dataset-json). As this
package is still experimental, we don't include code examples here but over time
and as momentum grows we can add extra examples to this article.
See R Submissions WG [pilot 5](https://rconsortium.github.io/submissions-pilot5-website/)
for a pilot submission using Dataset-JSON.

### Readable Code

When using internal company-specific closed source tools and codebases, we
often face challenges around providing readable code to health authorities as
internal single company solutions are by nature not easily accessible and understandable.
This is one of the benefits of using open source codebases in the submission
setting as it enables review teams to look deeper into the functions used, plus
they have the same open solutions available to use themselves as part of their
review. As more of the industry embraces open source, we may even see more
harmonization of submission packages across the industry as the same packages
are re-used, and this should lead to greater familiarity from review teams
easing their critical submission dossier review.

The nature of R packages and functions, as well as influence from `{tidyverse}`,
has encouraged most pharmaverse package development teams to embrace modular
strategy to their code. So when you use packages like `{admiral}`, you'll see
that the ADaM templates have been built with readability in mind specifically to
aid with eSubmission.

### Documentation

When using R for submission there are some important recommendations around
documentation you should include as part of the eSubmission package. Most
companies include these in the Reviewer's Guide documents (_the PHUSE Advance Hub
offers example templates for these_), but you might also choose to use a
programTOC document.

Here's the information we recommend including:

- R version used
- Descriptions of the main open source R package used, the version of each that
you used, and where these are available from (especially if anywhere other than CRAN)
- Descriptions of any closed source packages used (see details below)
- Basic installation instructions of the R version used and how to install the
package versions that you used (e.g. from CRAN). The R installation instructions
should cover installation from Windows.

Here is an example extract of what the documentation of the open source packages
used might look like:

![Example Reviewer's Guide Table](ADRG.png){fig-align="center"}

For closed source packages, if you plan to submit the code then there are 2 main
options - `{pkglite}` to create txt files, or zip files (as were successfully
used in later R Submission WG pilots).

Here is an example taken from the R Submission WG pilot 1 around how they used
`pkglite::pack()` to provide packages as txt files. On the other side
`pkglite::unpack()` could then be used to unpack.

```{r pack, eval=FALSE}
library(pkglite)

# Using pkglite to pack proprietary R package and saved into ectd/r0pkg.txt
path$home %>%
  collate(file_ectd(), file_auto("inst")) %>%
  prune("R/zzz.R") %>%
  pack(output = "ectd/r0pkg.txt")
```

You can also check the following example Reviewer's Guides as used for these pilots:

- [Pilot 1](https://github.com/RConsortium/submissions-pilot1-to-fda/blob/main/m5/datasets/rconsortiumpilot1/analysis/adam/datasets/adrg.pdf),
using `{pkglite}`
- [Pilot 3](https://github.com/RConsortium/submissions-pilot3-adam-to-fda/blob/main/m5/datasets/rconsortiumpilot3/analysis/adam/datasets/adrg.pdf),
using compressed .zip approach

For further Reviewer's Guide guidance, regardless of the programming language used for
regulatory submissions, refer to the completion guidelines found at this
[PHUSE ADRG Package WG](https://advance.hub.phuse.global/wiki/spaces/WEL/pages/26804660/Analysis+Data+Reviewer+s+Guide+ADRG+Package).
As we progress on this open source submission journey, it is worth monitoring this
[PHUSE Open-Source Metadata Documentation WG](https://phuse-org.github.io/OSDocuMeta/)
to see how the guidance continues to evolve in this space.

### Validation

Last but not least, through the submission process you might be asked to
provide evidence of why you consider the open source code used to be accurate
and reliable.

This of course is a huge topic and one where you should first refer to
guidance from the [R Validation Hub](https://www.pharmar.org/).

Pharmaverse also offers a selection of recommended packages that can help with
this via our [Developers page](https://pharmaverse.org/e2eclinical/developers/)
under Package Validation. Under CI/CD it also offers `thevalidatoR` which is a GitHub
action that can be used to produce standardised validation reports, such as this
[example](https://github.com/insightsengineering/thevalidatoR/blob/main/readme_files/report-0.1-admiral.pdf)
from an earlier version of `{admiral}`.
 
Cookie Preferences