Derive Baseline Flag or Last Observation Before Exposure Flag
Source:R/derive_blfl.R
derive_blfl.Rd
Derive the baseline flag variable (--BLFL
) or the last observation before
exposure flag (--LOBXFL
), from the observation date/time (--DTC
), and a
DM domain reference date/time.
Arguments
- sdtm_in
Input SDTM domain.
- dm_domain
DM domain with the reference variable
ref_var
- tgt_var
Name of variable to be derived (
--BLFL
or--LOBXFL
where--
is domain).- ref_var
vector of a date/time from the Demographics (DM) dataset, which serves as a point of comparison for other observations in the study. Common choices for this reference variable include "RFSTDTC" (the date/time of the first study treatment) or "RFXSTDTC" (the date/time of the first exposure to the study drug).
- baseline_visits
A character vector specifying the baseline visits within the study. These visits are identified as critical points for data collection at the start of the study, before any intervention is applied. This allows the function to assign the baseline flag if the --DTC matches to the reference date.
- baseline_timepoints
A character vector of timepoints values in --TPT that specifies the specific timepoints during the baseline visits when key assessments or measurements were taken. This allows the function to assign the baseline flag if the --DTC matches to the reference date.
Value
Modified input data frame with baseline flag variable --BLFL
or
last observation before exposure flag --LOBXFL
added.
Details
The derivation is as follows:
Remove records where the result (
--ORRES
) is missing. Also, exclude records with results labeled as "ND" (No Data) or "NOT DONE" in the--ORRES
column, which indicate that the measurement or observation was not completed.Remove records where the status (
--STAT
) indicates the observation or test was not performed, marked as "NOT DONE".Divide the date and time column (
--DTC
) and the reference date/time variable (ref_var
) into separate date and time components. Ignore any seconds recorded in the time component, focusing only on hours and minutes for further calculations.Set partial or missing dates to
NA
.Set partial or missing times to
NA
.Filter on rows that have domain and reference dates not equal to
NA
. (Ref to as X)Filter X on rows with domain date (--DTC) prior to (less than) reference date. (Ref to as A)
Filter X on rows with domain date (--DTC) equal to reference date but domain and reference times not equal to
NA
and domain time prior to (less than) reference time. (Ref to as B)Filter X on rows with domain date (--DTC) equal to reference date but domain and/or reference time equal to NA and:
VISIT is in baseline visits list (if it exists) and
xxTPT is in baseline timepoints list (if it exists). (Ref to as C)
Combine the rows from A, B, and C to get a data frame of pre-reference date observations. Sort the rows by
USUBJID
,--STAT
, and--ORRES
.Group by
USUBJID
and--TESTCD
and filter on the rows that have maximum value from--DTC
. Keep only the oak id variables and--TESTCD
(because these are the unique values). Remove any duplicate rows. Assign the baseline flag variable,--BLFL
, the last observation before exposure flag (--LOBXFL
) variable to these rows.Join the baseline flag onto the input dataset based on oak id vars
Examples
dm <- tibble::tribble(
~USUBJID, ~RFSTDTC, ~RFXSTDTC,
"test_study-375", "2020-09-28T10:10", "2020-09-28T10:10",
"test_study-376", "2020-09-21T11:00", "2020-09-21T11:00",
"test_study-377", NA, NA,
"test_study-378", "2020-01-20T10:00", "2020-01-20T10:00",
"test_study-379", NA, NA,
)
dm
#> # A tibble: 5 × 3
#> USUBJID RFSTDTC RFXSTDTC
#> <chr> <chr> <chr>
#> 1 test_study-375 2020-09-28T10:10 2020-09-28T10:10
#> 2 test_study-376 2020-09-21T11:00 2020-09-21T11:00
#> 3 test_study-377 NA NA
#> 4 test_study-378 2020-01-20T10:00 2020-01-20T10:00
#> 5 test_study-379 NA NA
sdtm_in <-
tibble::tribble(
~DOMAIN,
~oak_id,
~raw_source,
~patient_number,
~USUBJID,
~VSDTC,
~VSTESTCD,
~VSORRES,
~VSSTAT,
~VISIT,
"VS",
1L,
"VTLS1",
375L,
"test_study-375",
"2020-09-01T13:31",
"DIABP",
"90",
NA,
"SCREENING",
"VS",
2L,
"VTLS1",
375L,
"test_study-375",
"2020-10-01T11:20",
"DIABP",
"90",
NA,
"SCREENING",
"VS",
1L,
"VTLS1",
375L,
"test_study-375",
"2020-09-28T10:10",
"PULSE",
"ND",
NA,
"SCREENING",
"VS",
2L,
"VTLS1",
375L,
"test_study-375",
"2020-10-01T13:31",
"PULSE",
"85",
NA,
"SCREENING",
"VS",
1L,
"VTLS2",
375L,
"test_study-375",
"2020-09-28T10:10",
"SYSBP",
"120",
NA,
"SCREENING",
"VS",
2L,
"VTLS2",
375L,
"test_study-375",
"2020-09-28T10:05",
"SYSBP",
"120",
NA,
"SCREENING",
"VS",
1L,
"VTLS1",
376L,
"test_study-376",
"2020-09-20",
"DIABP",
"75",
NA,
"SCREENING",
"VS",
1L,
"VTLS1",
376L,
"test_study-376",
"2020-09-20",
"PULSE",
NA,
"NOT DONE",
"SCREENING",
"VS",
2L,
"VTLS1",
376L,
"test_study-376",
"2020-09-20",
"PULSE",
"110",
NA,
"SCREENING",
"VS",
2L,
"VTLS1",
378L,
"test_study-378",
"2020-01-20T10:00",
"PULSE",
"110",
NA,
"SCREENING",
"VS",
3L,
"VTLS1",
378L,
"test_study-378",
"2020-01-21T11:00",
"PULSE",
"105",
NA,
"SCREENING"
)
sdtm_in
#> # A tibble: 11 × 10
#> DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#> <chr> <int> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 VS 1 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 2 VS 2 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 3 VS 1 VTLS1 375 test_s… 2020… PULSE ND NA
#> 4 VS 2 VTLS1 375 test_s… 2020… PULSE 85 NA
#> 5 VS 1 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 6 VS 2 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 7 VS 1 VTLS1 376 test_s… 2020… DIABP 75 NA
#> 8 VS 1 VTLS1 376 test_s… 2020… PULSE NA NOT D…
#> 9 VS 2 VTLS1 376 test_s… 2020… PULSE 110 NA
#> 10 VS 2 VTLS1 378 test_s… 2020… PULSE 110 NA
#> 11 VS 3 VTLS1 378 test_s… 2020… PULSE 105 NA
#> # ℹ 1 more variable: VISIT <chr>
# Example 1:
observed_output <- derive_blfl(
sdtm_in = sdtm_in,
dm_domain = dm,
tgt_var = "VSLOBXFL",
ref_var = "RFXSTDTC",
baseline_visits = c("SCREENING")
)
observed_output
#> # A tibble: 11 × 11
#> DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#> <chr> <int> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 VS 1 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 2 VS 2 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 3 VS 1 VTLS1 375 test_s… 2020… PULSE ND NA
#> 4 VS 2 VTLS1 375 test_s… 2020… PULSE 85 NA
#> 5 VS 1 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 6 VS 2 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 7 VS 1 VTLS1 376 test_s… 2020… DIABP 75 NA
#> 8 VS 1 VTLS1 376 test_s… 2020… PULSE NA NOT D…
#> 9 VS 2 VTLS1 376 test_s… 2020… PULSE 110 NA
#> 10 VS 2 VTLS1 378 test_s… 2020… PULSE 110 NA
#> 11 VS 3 VTLS1 378 test_s… 2020… PULSE 105 NA
#> # ℹ 2 more variables: VISIT <chr>, VSLOBXFL <chr>
# Example 2:
observed_output2 <- derive_blfl(
sdtm_in = sdtm_in,
dm_domain = dm,
tgt_var = "VSLOBXFL",
ref_var = "RFXSTDTC",
baseline_timepoints = c("PRE-DOSE")
)
observed_output2
#> # A tibble: 11 × 11
#> DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#> <chr> <int> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 VS 1 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 2 VS 2 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 3 VS 1 VTLS1 375 test_s… 2020… PULSE ND NA
#> 4 VS 2 VTLS1 375 test_s… 2020… PULSE 85 NA
#> 5 VS 1 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 6 VS 2 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 7 VS 1 VTLS1 376 test_s… 2020… DIABP 75 NA
#> 8 VS 1 VTLS1 376 test_s… 2020… PULSE NA NOT D…
#> 9 VS 2 VTLS1 376 test_s… 2020… PULSE 110 NA
#> 10 VS 2 VTLS1 378 test_s… 2020… PULSE 110 NA
#> 11 VS 3 VTLS1 378 test_s… 2020… PULSE 105 NA
#> # ℹ 2 more variables: VISIT <chr>, VSLOBXFL <chr>
# Example 3: Output is the same as Example 2
observed_output3 <- derive_blfl(
sdtm_in = sdtm_in,
dm_domain = dm,
tgt_var = "VSLOBXFL",
ref_var = "RFXSTDTC",
baseline_visits = c("SCREENING"),
baseline_timepoints = c("PRE-DOSE")
)
observed_output3
#> # A tibble: 11 × 11
#> DOMAIN oak_id raw_source patient_number USUBJID VSDTC VSTESTCD VSORRES VSSTAT
#> <chr> <int> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 VS 1 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 2 VS 2 VTLS1 375 test_s… 2020… DIABP 90 NA
#> 3 VS 1 VTLS1 375 test_s… 2020… PULSE ND NA
#> 4 VS 2 VTLS1 375 test_s… 2020… PULSE 85 NA
#> 5 VS 1 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 6 VS 2 VTLS2 375 test_s… 2020… SYSBP 120 NA
#> 7 VS 1 VTLS1 376 test_s… 2020… DIABP 75 NA
#> 8 VS 1 VTLS1 376 test_s… 2020… PULSE NA NOT D…
#> 9 VS 2 VTLS1 376 test_s… 2020… PULSE 110 NA
#> 10 VS 2 VTLS1 378 test_s… 2020… PULSE 110 NA
#> 11 VS 3 VTLS1 378 test_s… 2020… PULSE 105 NA
#> # ℹ 2 more variables: VISIT <chr>, VSLOBXFL <chr>