Skip to contents

Derive the baseline flag variable (--BLFL) or the last observation before exposure flag (--LOBXFL), from the observation date/time (--DTC), and a DM domain reference date/time.

Usage

derive_blfl(
  sdtm_in,
  dm_domain,
  tgt_var,
  ref_var,
  baseline_visits = character(),
  baseline_timepoints = character()
)

Arguments

sdtm_in

Input SDTM domain.

dm_domain

DM domain with the reference variable ref_var

tgt_var

Name of variable to be derived (--BLFL or --LOBXFL where -- is domain).

ref_var

vector of a date/time from the Demographics (DM) dataset, which serves as a point of comparison for other observations in the study. Common choices for this reference variable include "RFSTDTC" (the date/time of the first study treatment) or "RFXSTDTC" (the date/time of the first exposure to the study drug).

baseline_visits

A character vector specifying the baseline visits within the study. These visits are identified as critical points for data collection at the start of the study, before any intervention is applied. This allows the function to assign the baseline flag if the --DTC matches to the reference date.

baseline_timepoints

A character vector of timepoints values in --TPT that specifies the specific timepoints during the baseline visits when key assessments or measurements were taken. This allows the function to assign the baseline flag if the --DTC matches to the reference date.

Value

Modified input data frame with baseline flag variable --BLFL or last observation before exposure flag --LOBXFL added.

Details

The derivation is as follows:

  • Remove records where the result (--ORRES) is missing. Also, exclude records with results labeled as "ND" (No Data) or "NOT DONE" in the --ORRES column, which indicate that the measurement or observation was not completed.

  • Remove records where the status (--STAT) indicates the observation or test was not performed, marked as "NOT DONE".

  • Divide the date and time column (--DTC) and the reference date/time variable (ref_var) into separate date and time components. Ignore any seconds recorded in the time component, focusing only on hours and minutes for further calculations.

  • Set partial or missing dates to NA.

  • Set partial or missing times to NA.

  • Filter on rows that have domain and reference dates not equal to NA. (Ref to as X)

  • Filter X on rows with domain date (--DTC) prior to (less than) reference date. (Ref to as A)

  • Filter X on rows with domain date (--DTC) equal to reference date but domain and reference times not equal to NA and domain time prior to (less than) reference time. (Ref to as B)

  • Filter X on rows with domain date (--DTC) equal to reference date but domain and/or reference time equal to NA and:

    • VISIT is in baseline visits list (if it exists) and

    • xxTPT is in baseline timepoints list (if it exists). (Ref to as C)

  • Combine the rows from A, B, and C to get a data frame of pre-reference date observations. Sort the rows by USUBJID, --STAT, and --ORRES.

  • Group by USUBJID and --TESTCD and filter on the rows that have maximum value from --DTC. Keep only the oak id variables and --TESTCD (because these are the unique values). Remove any duplicate rows. Assign the baseline flag variable, --BLFL, the last observation before exposure flag (--LOBXFL) variable to these rows.

  • Join the baseline flag onto the input dataset based on oak id vars

Examples

dm <- tibble::tribble(
  ~USUBJID, ~RFSTDTC, ~RFXSTDTC,
  "test_study-375", "2020-09-28T10:10", "2020-09-28T10:10",
  "test_study-376", "2020-09-21T11:00", "2020-09-21T11:00",
  "test_study-377", NA, NA,
  "test_study-378", "2020-01-20T10:00", "2020-01-20T10:00",
  "test_study-379", NA, NA,
)

dm
#> # A tibble: 5 × 3
#>   USUBJID        RFSTDTC          RFXSTDTC        
#>   <chr>          <chr>            <chr>           
#> 1 test_study-375 2020-09-28T10:10 2020-09-28T10:10
#> 2 test_study-376 2020-09-21T11:00 2020-09-21T11:00
#> 3 test_study-377 NA               NA              
#> 4 test_study-378 2020-01-20T10:00 2020-01-20T10:00
#> 5 test_study-379 NA               NA              

sdtm_in <-
  tibble::tribble(
    ~DOMAIN,
    ~oak_id,
    ~raw_source,
    ~patient_number,
    ~USUBJID,
    ~VSDTC,
    ~VSTESTCD,
    ~VSORRES,
    ~VSSTAT,
    ~VISIT,
    "VS",
    1L,
    "VTLS1",
    375L,
    "test_study-375",
    "2020-09-01T13:31",
    "DIABP",
    "90",
    NA,
    "SCREENING",
    "VS",
    2L,
    "VTLS1",
    375L,
    "test_study-375",
    "2020-10-01T11:20",
    "DIABP",
    "90",
    NA,
    "SCREENING",
    "VS",
    1L,
    "VTLS1",
    375L,
    "test_study-375",
    "2020-09-28T10:10",
    "PULSE",
    "ND",
    NA,
    "SCREENING",
    "VS",
    2L,
    "VTLS1",
    375L,
    "test_study-375",
    "2020-10-01T13:31",
    "PULSE",
    "85",
    NA,
    "SCREENING",
    "VS",
    1L,
    "VTLS2",
    375L,
    "test_study-375",
    "2020-09-28T10:10",
    "SYSBP",
    "120",
    NA,
    "SCREENING",
    "VS",
    2L,
    "VTLS2",
    375L,
    "test_study-375",
    "2020-09-28T10:05",
    "SYSBP",
    "120",
    NA,
    "SCREENING",
    "VS",
    1L,
    "VTLS1",
    376L,
    "test_study-376",
    "2020-09-20",
    "DIABP",
    "75",
    NA,
    "SCREENING",
    "VS",
    1L,
    "VTLS1",
    376L,
    "test_study-376",
    "2020-09-20",
    "PULSE",
    NA,
    "NOT DONE",
    "SCREENING",
    "VS",
    2L,
    "VTLS1",
    376L,
    "test_study-376",
    "2020-09-20",
    "PULSE",
    "110",
    NA,
    "SCREENING",
    "VS",
    2L,
    "VTLS1",
    378L,
    "test_study-378",
    "2020-01-20T10:00",
    "PULSE",
    "110",
    NA,
    "SCREENING",
    "VS",
    3L,
    "VTLS1",
    378L,
    "test_study-378",
    "2020-01-21T11:00",
    "PULSE",
    "105",
    NA,
    "SCREENING"
  )

sdtm_in
#> # A tibble: 11 × 10
#>    DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#>    <chr>   <int> <chr>               <int> <chr>   <chr> <chr>    <chr>   <chr> 
#>  1 VS          1 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  2 VS          2 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  3 VS          1 VTLS1                 375 test_s… 2020… PULSE    ND      NA    
#>  4 VS          2 VTLS1                 375 test_s… 2020… PULSE    85      NA    
#>  5 VS          1 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  6 VS          2 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  7 VS          1 VTLS1                 376 test_s… 2020… DIABP    75      NA    
#>  8 VS          1 VTLS1                 376 test_s… 2020… PULSE    NA      NOT D…
#>  9 VS          2 VTLS1                 376 test_s… 2020… PULSE    110     NA    
#> 10 VS          2 VTLS1                 378 test_s… 2020… PULSE    110     NA    
#> 11 VS          3 VTLS1                 378 test_s… 2020… PULSE    105     NA    
#> # ℹ 1 more variable: VISIT <chr>

# Example 1:
observed_output <- derive_blfl(
  sdtm_in = sdtm_in,
  dm_domain = dm,
  tgt_var = "VSLOBXFL",
  ref_var = "RFXSTDTC",
  baseline_visits = c("SCREENING")
)
observed_output
#> # A tibble: 11 × 11
#>    DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#>    <chr>   <int> <chr>               <int> <chr>   <chr> <chr>    <chr>   <chr> 
#>  1 VS          1 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  2 VS          2 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  3 VS          1 VTLS1                 375 test_s… 2020… PULSE    ND      NA    
#>  4 VS          2 VTLS1                 375 test_s… 2020… PULSE    85      NA    
#>  5 VS          1 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  6 VS          2 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  7 VS          1 VTLS1                 376 test_s… 2020… DIABP    75      NA    
#>  8 VS          1 VTLS1                 376 test_s… 2020… PULSE    NA      NOT D…
#>  9 VS          2 VTLS1                 376 test_s… 2020… PULSE    110     NA    
#> 10 VS          2 VTLS1                 378 test_s… 2020… PULSE    110     NA    
#> 11 VS          3 VTLS1                 378 test_s… 2020… PULSE    105     NA    
#> # ℹ 2 more variables: VISIT <chr>, VSLOBXFL <chr>

# Example 2:
observed_output2 <- derive_blfl(
  sdtm_in = sdtm_in,
  dm_domain = dm,
  tgt_var = "VSLOBXFL",
  ref_var = "RFXSTDTC",
  baseline_timepoints = c("PRE-DOSE")
)
observed_output2
#> # A tibble: 11 × 11
#>    DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#>    <chr>   <int> <chr>               <int> <chr>   <chr> <chr>    <chr>   <chr> 
#>  1 VS          1 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  2 VS          2 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  3 VS          1 VTLS1                 375 test_s… 2020… PULSE    ND      NA    
#>  4 VS          2 VTLS1                 375 test_s… 2020… PULSE    85      NA    
#>  5 VS          1 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  6 VS          2 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  7 VS          1 VTLS1                 376 test_s… 2020… DIABP    75      NA    
#>  8 VS          1 VTLS1                 376 test_s… 2020… PULSE    NA      NOT D…
#>  9 VS          2 VTLS1                 376 test_s… 2020… PULSE    110     NA    
#> 10 VS          2 VTLS1                 378 test_s… 2020… PULSE    110     NA    
#> 11 VS          3 VTLS1                 378 test_s… 2020… PULSE    105     NA    
#> # ℹ 2 more variables: VISIT <chr>, VSLOBXFL <chr>

# Example 3: Output is the same as Example 2
observed_output3 <- derive_blfl(
  sdtm_in = sdtm_in,
  dm_domain = dm,
  tgt_var = "VSLOBXFL",
  ref_var = "RFXSTDTC",
  baseline_visits = c("SCREENING"),
  baseline_timepoints = c("PRE-DOSE")
)
observed_output3
#> # A tibble: 11 × 11
#>    DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#>    <chr>   <int> <chr>               <int> <chr>   <chr> <chr>    <chr>   <chr> 
#>  1 VS          1 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  2 VS          2 VTLS1                 375 test_s… 2020… DIABP    90      NA    
#>  3 VS          1 VTLS1                 375 test_s… 2020… PULSE    ND      NA    
#>  4 VS          2 VTLS1                 375 test_s… 2020… PULSE    85      NA    
#>  5 VS          1 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  6 VS          2 VTLS2                 375 test_s… 2020… SYSBP    120     NA    
#>  7 VS          1 VTLS1                 376 test_s… 2020… DIABP    75      NA    
#>  8 VS          1 VTLS1                 376 test_s… 2020… PULSE    NA      NOT D…
#>  9 VS          2 VTLS1                 376 test_s… 2020… PULSE    110     NA    
#> 10 VS          2 VTLS1                 378 test_s… 2020… PULSE    110     NA    
#> 11 VS          3 VTLS1                 378 test_s… 2020… PULSE    105     NA    
#> # ℹ 2 more variables: VISIT <chr>, VSLOBXFL <chr>