Skip to contents

Introduction

As admiral is intended to be contributed by the user community, this article is meant for developers that want to either expand admiral functionalities or build on top of admiral. In order to keep the framework robust across the whole community, we have defined a programming strategy that should be followed in such cases. These contributions could include, for example, company specific derivations of ADaM datasets.

Functional Workflow

  • Overall programming will follow a functional approach.
  • We mandate the use of tidyverse (e.g. dplyr) over similar functionality existing in base R
  • Each ADaM dataset is built with a set of functions and not with free flow code.
  • Each ADaM dataset has a specific programming workflow.
  • Each function has a specific purpose that supports the ADaM Dataset programming workflow. It could be an admiral function or a company specific function.
  • Admiral functions can be re-used for company specific functions.
  • Each function belongs to one category defined in keywords/family.
  • Each function that is used to derive one or multiple variable(s) is required to be unit tested.
  • Functions have a standard naming convention.
  • Double coding is not used as a QC method (only if absolutely necessary).
  • ADaMs are created with readable, submission-ready code.

Functions in R

Function Design

Firstly, it is important to explain how we decide on the need for new derivation functions.

If a derivation rule or algorithm is common and highly similar across different variables/parameters (e.g. study day or duration) then we would provide a generic function that can be used to satisfy all the times this may be needed across different ADaMs. Similarly, if we feel that a certain derivation could be useful beyond a single purpose we also would provide a generic function (e.g. instead of a last known alive date function, we have an extreme date function where a user could find the last date from a selection, or for example the first).

Otherwise, if we feel that a derivation rule is a unique need or sufficiently complex to justify then we opt for a dedicated function for that specific variable/parameter (e.g. treatment-emergent flag for AEs).

If certain variables are closely connected (e.g. an imputed date and the corresponding imputation flag) then a single function would provide both variables.

If something needed for ADaM could be achieved simply via an existing tidyverse function, then we do not wrap this into an admiral function, as that would add an unnecessary extra layer for users.

The following principles are key when designing a new function:

  • Modularity - All code follows a modular approach, i.e. the steps must be clearly separated and have a dedicated purpose. This applies to scripts creating a dataset where each module should create a single variable or parameter. But also to complex derivations with several steps. Commenting on these steps is key for readability.

  • Avoid Copy and Paste - If the same or very similar code is used multiple times, it should be put into a separate function. This improves readability and maintainability and makes unit testing easier. This should not be done for every simple programming step where tidyverse can be used. But rather for computational functions or data checks. However, also consider not to nest too many functions.

  • Checks - Whenever a function fails, a meaningful error message must be provided with a clear reference to the input which caused the failure. A users should not have to dig into detailed code if they only want to apply a function. A meaningful error message supports usability.

  • Flexibility - Functions should be as flexible as possible as long as it does not reduce the usability. For example:

    • The source variables or newly created variables and conditions for selecting observations should not be hard-coded.

    • It is useful if a parameter triggers optional steps, e.g. if the filter parameter is specified, the input dataset is restricted, otherwise this step is skipped.

    • However, parameters should not trigger completely different algorithms. For example BNRIND could be derived based on BASE or based on ANRIND. It should not be implemented within one function as the algorithms are completely different. If BASE is used, the values are categorized while if ANRIND is used, the values are merged from the baseline observation.

Input, Output, and Side-effects

  • The behavior of the function is only determined by its input, not by any global object,
    i.e. all input like datasets, variable names, options, … must be provided to the function by parameters.
  • It is expected that the input datasets are not grouped. If any are grouped, the function must issue an error.
  • If a function requires grouping, the function must provide the by_vars parameter.
  • The output dataset must be ungrouped.
  • The functions should not sort (arrange) the output dataset at the end.
  • If the function needs to create temporary variables in an input dataset, these variables must start with temp_ and must be removed from the output dataset.
  • If the input dataset includes variables starting with temp_, an error must be issued.
  • If developers find the need to use or create environment objects to achieve flexibility, use the admiral_environment environment object created in admiral_environment.R. All objects which are stored in this environment must be documented in admiral_environment.R. An equivalent environment object and .R file exist for admiraldev as well. For more details how environments work, see relevant sections on environments in R Packages and Advanced R textbooks.
  • In general, the function must not have any side-effects like creating or modifying global objects, printing, writing files, …

Admiral Options

  • An exception is made for admiral options, see get_admiral_option() and set_admiral_options(), where we have certain pre-defined defaults with added flexibility to allow for user-defined defaults on commonly used function arguments e.g. subject_keys currently pre-defined as vars(STUDYID, USUBJID), but can be modified using set_admiral_options(subject_keys = vars(...)) at the top of a script. The reasoning behind this was to relieve the user of repeatedly changing aforementioned commonly used function arguments multiple times in a script, which may be called across many admiral functions.
  • If this additional flexibility needs to be added for another commonly used function argument e.g. future_input to be set as vars(...) it can be added as an admiral option. In the function formals define future_input = get_admiral_option("future_input") then proceed to modify the body and roxygen documentation of set_admiral_options().

Function Names

  • Function names should start with a verb and use snake case, e.g. derive_var_base().
Function name prefix Description
assert_ / warn_ / is_ Functions that check other functions’ inputs
derive_ Functions that take a dataset as input and return a new dataset with additional rows and/or columns
derive_var_ (e.g. derive_var_trtdurd) Functions which add a single variable
derive_vars_ (e.g. derive_vars_dt) Functions which add multiple variables
derive_param_ (e.g. derive_param_os) Functions which add a single parameter
compute_ / calculate_ / … Functions that take vectors as input and return a vector
create_ / consolidate_ Functions that create datasets without keeping the original observations
get_ Usually utility functions that return very specific objects that get passed through other functions
filter_ Functions that filter observations based on conditions associated with common clinical trial syntax
Function Name Suffix Description
_derivation (suffix) High order functions that call a user specified derivation
_date / _time / _dt / _dtc / _dtm Functions associated with dates, times, datetimes, and their character equivalents.
_source Functions that create source datasets that usually will be passed through other derive_ functions.
Other Common Function Name Terms Description
_merged_ / _joined_ / _extreme_ Functions that follow the generic function user-guide.

Please note that the appropriate var/vars prefix should be used for all cases in which the function creates any variable(s), regardless of the presence of a new_var parameter in the function call.

Function Parameters

The default value of optional parameters should be NULL.

There is a recommended parameter order that all contributors are asked to adhere to (in order to keep consistency across functions):

  1. dataset (and any additional datasets denoted by dataset_*)
  2. by_vars
  3. order
  4. new_var (and any related new_var_* parameters)
  5. filter (and any additional filters denoted by filter_*)
  6. all additional arguments:
    • Make sure to always mention start_date before end_date (or related).

Names of variables inside a dataset should be passed as symbols rather than strings, i.e. AVAL rather than "AVAL". If a parameter accepts one or more variables as input, the variables should be wrapped inside vars().

For example:

  • new_var = TEMPBL
  • by_vars = vars(PARAMCD, AVISIT)
  • filter = PARAMCD == "TEMP"
  • order = vars(AVISIT, desc(AESEV))

Parameter must not accept expressions for assigning the value of the new variable. Instead separate parameters need to be provided for defining the value. For example, if a function derives a variable which may be imputed, the following is not acceptable.

    ...
    new_var = vars(mydtm = convert_dtc_to_dtm(impute_dtc(cmstdtc,
                                                         date_imputation = "last",
                                                         time_imputation = "last"))),
    ...

Separate parameters for the imputation must be provided, e.g.:

    ...
    new_var = mydtm,
    source_var = cmstdtc,
    date_imputation = "last",
    time_imputation = "last",
    ...

Each function parameter needs to be tested with assert_ type of function.

Each expression needs to be tested for the following (there are many utility functions in admiral available to the contributor):

  • whether it is an expression (or a list of expressions, depending on the function)
  • whether it is a valid expression (i.e. whether it evaluates without error)

The only exception to this is derive_var_basetype() where we allowed the use of rlang::exprs(). The reason is that in this case the user needs to have the flexibility to provide not just symbols but usually more complicated filtering conditions (that may be based on multiple input parameters).

Common Function Parameters Naming Convention

The first parameter of derive_ functions should be the input dataset and it should be named dataset. If more than one input dataset is required, the other input dataset should start with dataset_, e.g., dataset_ex.

Parameters for specifying items to add should start with new_. If a variable is added, the second part of the parameter name should be var, if a parameter is added, it should be param. For example: new_var, new_var_unit, new_param.

Parameters which expect a boolean or boolean vector must start with a verb, e.g., is_imputed or impute_date.

List of Common Parameters

Parameter Name Description
dataset The input dataset. Expects a data.frame or a tibble.
by_vars Variables to group by.
order List of expressions for sorting a dataset, e.g., vars(PARAMCD, AVISITN, desc(AVAL)).
new_var Name of a single variable to be added to the dataset.
new_vars List of variables to be added to the dataset.
new_var_unit Name of the unit variable to be added. It should be the unit of the variable specified for the new_var parameter.
filter Expression to filter a dataset, e.g., PARAMCD == "TEMP".
start_date The start date of an event/interval. Expects a date object.
end_date The end date of an event/interval. Expects a date object.
start_dtc (Partial) start date/datetime in ISO 8601 format.
dtc (Partial) date/datetime in ISO 8601 format.
date Date of an event / interval. Expects a date object.
set_values_to List of variable name-value pairs.
subject_keys Variables to uniquely identify a subject, defaults to vars(STUDYID, USUBJID). In function formals, use subject_keys = get_admiral_option("subject_keys")

Source Code Formatting

All source code should be formatted according to the tidyverse style guide. The lintr package will be used to check and enforce this.

Input Checking

In line with the fail-fast design principle, function inputs should be checked for validity and, if there’s an invalid input, the function should stop immediately with an error. An exception is the case where a variable to be added by a function already exists in the input dataset: here only a warning should be displayed and the function should continue executing.

Inputs should be checked using custom assertion functions defined in R/assertions.R. These custom assertion functions should either return an error in case of an invalid input or return nothing.

For the most common types of input parameters like a single variable, a list of variables, a dataset, … functions for checking are available (see assertions).

Parameters which expect keywords should handle them in a case-insensitive manner, e.g., both date_imputation = "FIRST" and date_imputation = "first" should be accepted. The assert_character_scalar() function helps with handling parameters in a case-insensitive manner.

A parameter should not be checked in an outer function if the parameter name is the same as in the inner function. This rule is applicable only if both functions are part of admiral.

Function Header (Documentation)

Every function that is exported from the package must have an accompanying header that should be formatted according to the roxygen2 convention.

In addition to the roxygen2 parameters, @author and @keywords are also used.

Author is the owner of the function while the keywords are used to categorize the function. Please see section “Categorization of functions”.

An example is given below:

#' Derive Relative Day Variables
#'
#' Adds relative day variables (`--DY`) to the dataset, e.g., `ASTDY` and
#' `AENDY`.
#'
#' @param dataset Input dataset
#'
#'   The columns specified by the `reference_date` and the `source_vars`
#'   parameter are expected.
#'
#' @param reference_date The start date column, e.g., date of first treatment
#'
#'   A date or date-time object column is expected.
#'
#'   Refer to `derive_var_dt()` to impute and derive a date from a date
#'   character vector to a date object.
#'
#' @param source_vars A list of datetime or date variables created using
#'   `vars()` from which dates are to be extracted. This can either be a list of
#'   date(time) variables or named `--DY` variables and corresponding --DT(M)
#'   variables e.g. `vars(TRTSDTM, ASTDTM, AENDT)` or `vars(TRTSDT, ASTDTM,
#'   AENDT, DEATHDY = DTHDT)`. If the source variable does not end in --DT(M), a
#'   name for the resulting `--DY` variable must be provided.
#'
#' @author Teckla Akinyi
#'
#' @details The relative day is derived as number of days from the reference
#'   date to the end date. If it is nonnegative, one is added. I.e., the
#'   relative day of the reference date is 1. Unless a name is explicitly
#'   specified, the name of the resulting relative day variable is generated
#'   from the source variable name by replacing DT (or DTM as appropriate) with
#'   DY.
#'
#' @return The input dataset with `--DY` corresponding to the `--DTM` or `--DT`
#'   source variable(s) added
#'
#' @keywords der_date_time
#' @family der_date_time
#'
#' @export
#'
#' @examples
#' library(lubridate)
#' library(dplyr, warn.conflicts = FALSE)
#' library(tibble)
#'
#' datain <- tribble(
#'   ~TRTSDTM,             ~ASTDTM,               ~AENDT,
#'  "2014-01-17T23:59:59", "2014-01-18T13:09:O9", "2014-01-20"
#' ) %>%
#'  mutate(
#'    TRTSDTM = as_datetime(TRTSDTM),
#'    ASTDTM = as_datetime(ASTDTM),
#'    AENDT = ymd(AENDT)
#'  )
#'
#' derive_vars_dy(
#'   datain,
#'   reference_date = TRTSDTM,
#'   source_vars = vars(TRTSDTM, ASTDTM, AENDT)
#' )

The following fields are mandatory:

  • @param: One entry per function parameter. The following attributes should be described: expected data type (e.g. data.frame, logical, numeric etc.), default value (if any), permitted values (if applicable), optionality (i.e. is this a required parameter). If the expected input is a dataset then the required variables should be clearly stated.
  • @details: A natural-language description of the derivation used inside the function.
  • @author: The person who wrote the function. In case a function is later on updated by another person the name should be appended to the list of authors.
  • @keyword: One applicable tag to the function - identical to family.
  • @family: One applicable tag to the function - identical to keyword.
  • @return: A description of the return value of the function. Any newly added variable(-s) should be mentioned here.
  • @examples: A fully self-contained example of how to use the function. Self-contained means that, if this code is executed in a new R session, it will run without errors. That means any packages need to be loaded with library() and any datasets needed either to be created directly inside the example code or loaded using data(). If a dataset is created in the example, it should be done so using the function tribble() (specify library(tibble) before calling this function). If other functions are called in the example, please specify library(pkg_name) then refer to the respective function fun() as opposed to the preferred pkg_name::fun() notation as specified in Unit Test Guidance. Make sure to align columns as this ensures quick code readability.

Copying descriptions should be avoided as it makes the documentation hard to maintain. For example if the same parameter with the same description is used by more than one function, the parameter should be described for one function and the other functions should use @inheritParams <function name where the parameter is described>.

Please note that if @inheritParams func_first is used in the header of the func_second() function, those parameter descriptions of func_first() are included in the documentation of func_second() for which

  • the parameter is offered by func_second() and
  • no @param tag for the parameter is included in the header of func_second().

The order of the @param tags should be the same as in the function definition. The @inheritParams tags should be after the @param. This does not affect the order of the parameter description in the rendered documentation but makes it easier to maintain the headers.

Variable names, expressions, functions, and any other code must be enclosed which backticks. This will render it as code.

For functions which derive a specific CDISC variable, the title must state the label of the variable without the variable name. The variable should be stated in the description.

Categorization of Functions

The functions are categorized by keywords and families within the roxygen header. Categorization is important as the admiral user-facing functions base totals above 125 and is growing! However, to ease the burden for developers, we have decided that the keywords and families should be identical in the roxygen header, which are specified via the @keywords and @family fields. To reiterate, each function must use the same keyword and family. Also, please note that the keywords and families are case-sensitive.

@keywords

The keywords allows for the reference page to be easily organized when using certain pgkdown functions. For example, using the function has_keyword(der_bds_gen) in the _pkgdown.yml file while building the website will collect all the BDS General Derivation functions and display them in alphabetical order on the Reference Page in a section called BDS-Specific.

@family

The families allow for similar functions to be displayed in the See Also section of a function’s documentation. For example, a user looking at derive_vars_dy() function documentation might be interested in other Date/Time functions. Using the @family tag der_date_time will display all the Date/Time functions available in admiral to the user in the See Also section of derive_vars_dy() function documentation. Please take a look at the function documentation for derive_vars_dy() to see the family tag in action.

Below are the list of available keyword/family tags to be used in admiral functions. If you think an additional keyword/family tag should be added, then please add an issue in GitHub for discussion.

Keyword/Family Description
com_date_time Date/Time Computation Functions that returns a vector
com_bds_findings BDS-Findings Functions that returns a vector
create_aux Functions for Creating Auxiliary Datasets
datasets Example datasets used within admiral
der_gen General Derivation Functions that can be used for any ADaM.
der_date_time Date/Time Derivation Function
der_bds_gen Basic Data Structure (BDS) Functions that can be used across different BDS ADaM (adex, advs, adlb, etc)
der_bds_findings Basic Data Structure (BDS) Functions specific to the BDS-Findings ADaMs
der_prm_bds_findings BDS-Findings Functions for adding Parameters
der_adsl Functions that can only be used for creating ADSL.
der_tte Function used only for creating a Time to Event (TTE) Dataset
der_occds OCCDS specific derivation of helper Functions
der_prm_tte TTE Functions for adding Parameters to TTE Dataset
deprecated Function which will be removed from admiral after next release. See Deprecation Guidance.
metadata Auxiliary datasets providing definitions as input for derivations, e.g. grading criteria or dose frequencies
utils_ds_chk Utilities for Dataset Checking
utils_fil Utilities for Filtering Observations
utils_fmt Utilities for Formatting Observations
utils_help Utilities used within Derivation functions
utils_examples Utilities used for examples and template scripts
source_specifications Source Specifications
high_order_function Higher Order Functions
internal Internal functions only available to admiral developers
assertion* Asserts a certain type and gives warning, error to user
warning Provides custom warnings to user
what A function that …
is A function that …
get A function that …

NOTE: It is strongly encouraged that each @keyword and @family are to be identical. This eases the burden of development and maintenance for admiral functions. If you need to use multiple keywords or families, please reach out to the core development team for discussion.

Missing values

Missing values (NAs) need to be explicitly shown.

Regarding character vectors converted from SAS files: SAS treats missing character values as blank. Those are imported into R as empty strings ("") although in nature they are missing values (NA). All empty strings that originate like this need to be converted to proper R missing values NA.

File Structuring

Organizing functions into files is more of an art than a science. Thus, there are no hard rules but just recommendations. First and foremost, there are two extremes that should be avoided: putting each function into its own file and putting all functions into a single file. Apart from that the following recommendations should be taken into consideration when deciding upon file structuring:

  • If a function is very long (together with its documentation), store it in a separate file
  • If some functions are documented together, put them into one file
  • If some functions have some sort of commonality or relevance with one another (like dplyr::bind_rows() and dplyr::bind_cols()), put them into one file
  • Store functions together with their helpers and methods
  • Have no more than 1000 lines in a single file, unless necessary (exceptions are, for example, classes with methods)

It is the responsibility of both the author of a new function and reviewer to ensure that these recommendations are put into practice.

R Package Dependencies

Package dependencies have to be documented in the DESCRIPTION file. If a package is used only in examples and/or unit tests then it should be listed in Suggests, otherwise in Imports.

Functions from other packages have to be explicitly imported by using the @importFrom tag in the R/admiral-package.R file. To import the if_else() and mutate() function from dplyr the following line would have to be included in that file: #' @importFrom dplyr if_else mutate.

Some of these functions become critically important while using admiral and should be included as an export. This applies to functions which are frequently called within {admiral }function calls like dplyr::vars(), dplyr::desc() or the pipe operator magrittr::%>%. To export these functions, the following R code should be included in the R/reexports.R file using the format:

#' @export
pkg_name::fun

Metadata

Functions should only perform the derivation logic and not add any kind of metadata, e.g. labels.

Unit Testing

A function requires a set of unit tests to verify it produces the expected result. See Writing Unit Tests in {admiral} for details.

Deprecation

As admiral is still evolving, functions or parameters may need to be removed or replaced with more efficient options from one release to another. In such cases, the relevant function or parameter must be marked as deprecated. This deprecation is done in three phases over our release cycles.

  • Phase 1: In the release where the identified function or parameter is to be deprecated there will be a warning issued when using the function or parameter using deprecate_warn()

  • Phase 2: In the next release an error will be thrown using deprecate_stop()

  • Phase 3: Finally in the 3rd release thereafter the function will be removed from the package altogether

Information about deprecation timelines must be added to the warning/error message.

Note that the deprecation cycle time for a function or parameter based on our current release schedule is 6 months.

Documentation

If a function or parameter is removed, the documentation must be updated to indicate the function or the parameter is now deprecated and which new function/parameter should be used instead.

The documentation will be updated at:

  • the description level for a function,

  • the keywords will be replaced with deprecated

  • the @family roxygen tag will become deprecated

    #' Title of the function
    #'
    #' @description
    #' `r lifecycle::badge("deprecated")`
    #'
    #' This function is *deprecated*, please use `new_fun()` instead.
    #' .
    #' @family deprecated
    #'
    #' @keywords deprecated
    #' .
  • the @examples section should be removed.

  • the @param level for a parameter.

    @param old_param *Deprecated*, please use `new_param` instead.

Handling of warning and error

When a function or parameter is deprecated, the function must be updated to issue a warning or error using deprecate_warn() and deprecate_stop(), respectively, as described above.

There should be a test case added in tests/testthat/test-deprecation.R that checks whether this warning/error is issued as appropriate when using the deprecated function or parameter.

Function

In the initial release in which a function is deprecated the original function body must be replaced with a call to deprecate_warn() and subsequently all arguments should be passed on to the new function.

fun_xxx <- function(dataset, some_param, other_param) {
  deprecate_warn("x.y.z", "fun_xxx()", "new_fun_xxx()")
  new_fun_xxx(
    dataset = dataset,
    some_param = some_param,
    other_param = other_param
  )
}

In the following release the function body should be changed to just include a call to deprecate_stop().

fun_xxx <- function(dataset, some_param, other_param) {
  deprecate_stop("x.y.z", "fun_xxx()", "new_fun_xxx()")
}

Finally, in the next release the function should be removed from the package.

Parameter

If a parameter is removed and is not replaced, an error must be generated:

### BEGIN DEPRECATION
  if (!missing(old_param)) {
    deprecate_stop("x.y.z", "fun_xxx(old_param = )", "fun_xxx(new_param = )")
  }
### END DEPRECATION

If the parameter is renamed or replaced, a warning must be issued and the new parameter takes the value of the old parameter until the next release. Note: parameters which are not passed as vars() argument (e.g. new_var = VAR1 or filter = AVAL >10) will need to be quoted.

### BEGIN DEPRECATION
  if (!missing(old_param)) {
    deprecate_warn("x.y.z", "fun_xxx(old_param = )", "fun_xxx(new_param = )")
    # old_param is given using vars()
    new_param <- old_param
    # old_param is NOT given using vars()
    new_param <- enquo(old_param)
  }
### END DEPRECATION

Unit Testing

Unit tests for deprecated functions and parameters must be added to tests/testthat/test-deprecation.R to ensure that a warning or error is issued.

When writing the unit test, check that the error or warning has the right class, i.e. “lifecycle_error_deprecated” or “lifecycle_warning_deprecated”, respectively. The unit-test should follow the corresponding format, per the unit test guidance:

# For deprecated functions that issues error

## Test #: An error is thrown if `derive_var_example()` is called ----
test_that("deprecation Test #: derive_var_example() An error is thrown if
          `derive_var_example()` is called", {
  expect_error(
    derive_var_example(),
    class = "lifecycle_error_deprecated"
  )
})
# For deprecated functions that issues warning

## Test #: A warning is thrown if `derive_var_example()` is called ----
test_that("deprecation Test #: derive_var_example() A warning is thrown if
          `derive_var_example()` is called", {
  expect_warning(
    derive_var_example(),
    class = "lifecycle_warning_deprecated"
  )
})

Other unit tests of deprecated functions must be removed.

Best Practices and Hints

Please take the following list as recommendation and try to adhere to its rules if possible.

  • Parameters in function calls should be named except for the first parameter (e.g. assert_data_frame(dataset, required_vars = vars(var1, var2), optional = TRUE)).
  • dplyr::if_else() should be used when there are only two conditions. Try to always set the missing parameter whenever appropriate.

Readable Code for ADaM

Each function should be considered as readable code by default.

Basic Rules

All R code that produces ADaM datasets should be based on readable code for their 1st line code. Producing Readable Code should not be part or the responsibility of any QC activities or 2nd line programming.

ADaMs in R will be highly modularized. This means code needs to be commented across the set of functions that produces the final ADaM dataset.

This guidance is built on the assumption that each ADaM dataset will have one R script that will call a set of functions needed to produce the corresponding ADaM dataset.

Header for the main R-Script

The main R-script would contain all function calls to create the ADaM dataset. In the header, describe the ADaM dataset that will be produced:

  • Name
  • Label
  • Input SDTMs and ADaMs
  • Short description of its purpose if not obvious by the label (novel endpoints mainly)

Header for functions

  • See Function header

Functions

  • Function calls should have a preceding comment which is a short and meaningful description for which purpose the function is called, like:
    • Derive variable X if function name is not descriptive or if it is a customized variable.
    • Ideally use plain english to describe what a function is deriving.
      • # derive analysis study day
      • # derive age group <= 18
    • Impute date with missing days.
    • Check for missing values.
  • A comment can cover multiple function calls that belong to a category or group of variables. Ideally one keeps it in line with the ADaM IG terminology, like Treatment Variables, Timing Variables, Identifier Variables as much as possible
    • # derive all population indicator variables RANDFL, SAFFL …
  • Functions that create user defined variables, specific to the molecule or study or support a specific endpoint should be called out specifically, like: the following function calls flag the special Adverse Events or a comment that highlights a molecule specific endoint
  • A function that covers a whole algorithm should have a preceding comment that indicates the purpose of the algorithm, like
    • # derive secondary endpoint XYZ

Code

The code itself should be described in meaningful, plain English, so that a reviewer can understand how the piece of code works, e.g. 

  • # calculates the sum of scores divided by the non missing numbers of scores to calculate the average score

Code within a function that creates a specific variable should have a starting comment and an ending comment, e.g.

# calculate X
# describe how the code works in meaningful plain english
"<code>"
# end of X

If meaningful, comments can cover multiple variables within a piece of code

# creates X, Y, Z
# describe how the code works in meaningful plain english
"<code>"
# end of X, Y, Z

R and package versions for development

  • The choice of R Version is not set in stone. However, a common development environment is important to establish when working across multiple companies and multiple developers. We currently work in the earliest of the three latest R Versions. This need for a common development environment also carries over for our choice of package versions.
  • GitHub allows us through the Actions/Workflows to test admiral under several versions of R as well as several versions of dependent R packages needed for admiral. Currently we test admiral against the three latest R Versions and the closest snapshots of packages to those R versions. You can view this workflow and others on our admiralci GitHub Repository.
  • This common development allows us to easily re-create bugs and provide solutions to each other issues that developers will encounter.
  • Reviewers of Pull Requests when running code will know that their environment is identical to the initiator of the Pull Request. This ensures faster review times and higher quality Pull Request reviews.
  • We achieve this common development environment by using a lockfile created from the renv package. New developers will encounter a suggested renv::restore() in the console to revert or move forward your R version and package versions.