Skip to contents

Main Concept of Data cut process

The main idea of {datacutr} is to provide a standardized approach to applying a datacut to SDTM datasets.
The process applied by the package is the following,

  • create a meta dataset DCUT that references all patients to be included within the cut, and the cut date to be used as reference (normally the Clinical Cut-off Date that data has been cleaned to).
  • using DCUT as reference, records can be removed from the SDTM data that are either a) patients not part of the reference DCUT, or b) records that can be identified as after the data cut date supplied.

Data cut approaches for different SDTM

The package relies on creating lists of SDTM to be processed in specific ways, these include,

  • No cut - SDTM to remain exactly as source
  • Patient cut - Only Patients identified in source meta DCUT are kept, no other exclusion of records is conducted
  • Date cut - Only Patients identified in source meta DCUT are kept, and records identified after the data cut date are removed
  • Special DM cut - As DM contains critical temporal derivations around Deaths that would require update within a data cut, this option allows the user to revert DM.DTHFL and DM.DTHDTC if death is identified after the data cut date

Technical approach within {datacutr}

The {datacutr} package allows two different approaches for the user to apply the data cut process

  • Modular approach - This approach breaks down all the steps of the data cut into individual functions. This is useful if the user wishes to have transparency of the process, and for de-bugging. It also allows the user to step into the process if more bespoke or study specific handling is required not already defined as part of the {datacutr} process. See Modular Approach for how to implement
  • Wrapped approach - This approach is more for users who want a quick cut generation, and have no need to step in and alter the approach taken by {datacutr}. See Wrapped Approach for how to implement

Data Handling Rules

  • Inclusion of Subjects
    Subjects with randomization date on or before the data cutoff date are included in the data cut. If a study is not randomized, then the enrolment date should be used instead. For studies where no study drug is administered, and where no randomization or enrolment is performed (e.g., observational studies), a study-specific definition of enrolment date should be provided.
  • Inclusion of Records for Subjects Included in the Data Cut
    For records involving single dates, the record is included in the data cut if the relevant date is on or before the data cutoff date.
    For records involving interval dates (start and end dates), the start date is used for comparison. The record is included in the data cut if the start date is on or before the data cutoff date.
    For records involving both single (–DTC) and interval (–STDTC) dates (eg. Findings About (FA) dataset):
    If only one date is available (either –DTC or –STDTC), then the cut is applied on this available date. If two dates are available (both –DTC and –STDTC), then the cut is applied to the interval date (–STDTC) only. If neither date is available then the record is included
  • Missing or Partial Dates
    The following rule is used to determine whether to include a record with missing or partial date in the data cut. The motivation is to be as inclusive as possible.
    A record is included in the data cut except in the following cases:
    Only year is present and year is after the year of the data cutoff date. (Example: Data cutoff date = 30NOV2012; concomitant medication start date = 2013)
    Only month and year are present and month-year is after the month-year of the data cutoff date. (Example: Data cutoff date = 01DEC2012; AE start date = 2013-01)
    NOTE: If both start and end dates are collected, the rule is applied to the start date only. If the start date is missing, the record is included in the data cut.
  • Handling of Deaths
    For deaths, the derived DM death information is updated to reflect the state at the time of the data cutoff date. The Death Flag (DM.DTHFL) and associated variables (e.g., DM.DTHDT) are set to missing if the subject died after the data cutoff date.

Validation

All functions are reviewed and tested to ensure that they work as described in the documentation. They are not validated yet.

Starting a Script

{datacutr} provides a template R scripts as a starting point. See Modular Approach and Wrapped Approach for more details.
See also Template scripts