Skip to contents

Main Concept of Data cut process

The main idea of {datacutr} is to provide a standardized approach to applying a datacut to SDTM datasets.
The process applied by the package is the following,

  • create a meta dataset DCUT that references all patients to be included within the cut, and the cut date to be used as reference (normally the Clinical Cut-off Date that data has been cleaned to).
  • using DCUT as reference, records can be removed from the SDTM data that are either a) patients not part of the reference DCUT, or b) records that can be identified as after the data cut date supplied.

Data cut approaches for different SDTM

The package relies on creating lists of SDTM to be processed in specific ways, these include,

  • No cut - SDTM to remain exactly as source
  • Patient cut - Only Patients identified in source meta DCUT are kept, no other exclusion of records is conducted
  • Date cut - Only Patients identified in source meta DCUT are kept, and records identified after the data cut date are removed
  • Special DM cut - As DM contains critical temporal derivations around Deaths that would require update within a data cut, this option allows the user to revert DM.DTHFL and DM.DTHDTC if death is identified after the data cut date

Technical approach within {datacutr}

The {datacutr} package allows two different approaches for the user to apply the data cut process

  • Modular approach - This approach breaks down all the steps of the data cut into individual functions. This is useful if the user wishes to have transparency of the process, and for de-bugging. It also allows the user to step into the process if more bespoke or study specific handling is required not already defined as part of the {datacutr} process. See Modular Approach for how to implement
  • Wrapped approach - This approach is more for users who want a quick cut generation, and have no need to step in and alter the approach taken by {datacutr}. See Wrapped Approach for how to implement

Data Handling Rules

  • Inclusion of Subjects
    Subjects with randomization date on or before the data cutoff date are included in the data cut. If a study is not randomized, then the enrolment date should be used instead. For studies where no study drug is administered, and where no randomization or enrolment is performed (e.g., observational studies), a study-specific definition of enrolment date should be provided.
  • Inclusion of Records for Subjects Included in the Data Cut
    For records involving dates, the record is included in the data cut if the relevant date is on or before the data cutoff date. The user selects the date variable for each domain that the cut should be applied to (ie. –STDTC or –DTC). If this is a mix (eg. different sources within FA using either FADTC or FASTDTC) then the expectation is on the user to create a temporary variable to store the correct date per observation. An example is shown in the “Example Wrapped Approach” vignette.
  • Missing or Partial Date/Times
    Our motivation here is to be as inclusive as possible.
    We expect the data cutoff date/time to have at least a complete date. If the time is missing, then we impute to the maximum possible time, i.e; we impute with 23 for missing hours, 59 for missing minutes, and 59 for missing seconds.
    Any SDTMv records where the chosen SDTMv date/time variable is missing (or is missing the year component) are included in the data cut.
    All other partial date/times are imputed to the minimum possible date/time, i.e; we impute with 01 for missing month, 01 for missing day, 00 for missing hours, 00 for missing minutes, and 00 for missing seconds.
    By imputing any missing components of the data cutoff date/time to their maximum possible value and any missing components of the SDTMv date/time variable to their minimum possible value, we ensure that a record is only cut if it is clear that the SDTMv date/time variable is after the data cutoff date/time.
  • Handling of Deaths
    For deaths, the derived DM death information is updated to reflect the state at the time of the data cutoff date. The Death Flag (DM.DTHFL) and associated variables (e.g., DM.DTHDT) are set to missing if the subject died after the data cutoff date.

Validation

All functions are reviewed and tested to ensure that they work as described in the documentation. They are not validated yet.

Starting a Script

{datacutr} provides a template R scripts as a starting point. See Modular Approach and Wrapped Approach for more details.