Introduction
Conditioned data frames, or cnd_df
, are a powerful tool
in the sdtm.oak package designed to facilitate
conditional transformations on data frames. This article explains how to
create and use conditioned data frames, particularly in the context of
SDTM domain derivations.
Creating Conditioned Data Frames
A conditioned data frame is a regular data frame extended with a
logical vector cnd
that marks rows for subsequent
conditional transformations. The condition_add()
function
is used to create these conditioned data frames.
Simple Example
Consider a simple data frame df
:
(df <- tibble(x = 1L:3L, y = letters[1L:3L]))
## # A tibble: 3 × 2
## x y
## <int> <chr>
## 1 1 a
## 2 2 b
## 3 3 c
We can create a conditioned data frame where only rows where
x > 1
are marked:
(cnd_df <- condition_add(dat = df, x > 1L))
## # A tibble: 3 × 2
## # Cond. tbl: 2/1/0
## x y
## <int> <chr>
## 1 F 1 a
## 2 T 2 b
## 3 T 3 c
Here, only the second and third rows are marked as
TRUE
.
Usage in SDTM Domain Derivations
The real power of conditioned data frames manifests when they are
used with functions such as assign_no_ct
,
assign_ct
, hardcode_no_ct
, and
hardcode_ct
. These functions perform derivations only for
the records that match the pattern of TRUE
values in
conditioned data frames.
Example with Concomitant Medications (CM) Domain
Consider a simplified dataset of concomitant medications, where we
want to derive a new variable CMGRPID (Concomitant Medication Group ID)
based on the condition that the medication treatment (CMTRT) is
"BENADRYL"
.
Here is a simplified raw Concomitant Medications data set
(cm_raw
):
cm_raw <- tibble::tibble(
oak_id = seq_len(14L),
raw_source = "ConMed",
patient_number = c(375L, 375L, 376L, 377L, 377L, 377L, 377L, 378L, 378L, 378L, 378L, 379L, 379L, 379L),
MDNUM = c(1L, 2L, 1L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 3L, 1L, 2L, 3L),
MDRAW = c(
"BABY ASPIRIN", "CORTISPORIN", "ASPIRIN",
"DIPHENHYDRAMINE HCL", "PARCETEMOL", "VOMIKIND",
"ZENFLOX OZ", "AMITRYPTYLINE", "BENADRYL",
"DIPHENHYDRAMINE HYDROCHLORIDE", "TETRACYCLINE",
"BENADRYL", "SOMINEX", "ZQUILL"
)
)
cm_raw
## # A tibble: 14 × 5
## oak_id raw_source patient_number MDNUM MDRAW
## <int> <chr> <int> <int> <chr>
## 1 1 ConMed 375 1 BABY ASPIRIN
## 2 2 ConMed 375 2 CORTISPORIN
## 3 3 ConMed 376 1 ASPIRIN
## 4 4 ConMed 377 1 DIPHENHYDRAMINE HCL
## 5 5 ConMed 377 2 PARCETEMOL
## 6 6 ConMed 377 3 VOMIKIND
## 7 7 ConMed 377 5 ZENFLOX OZ
## 8 8 ConMed 378 4 AMITRYPTYLINE
## 9 9 ConMed 378 1 BENADRYL
## 10 10 ConMed 378 2 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 11 ConMed 378 3 TETRACYCLINE
## 12 12 ConMed 379 1 BENADRYL
## 13 13 ConMed 379 2 SOMINEX
## 14 14 ConMed 379 3 ZQUILL
To derive the CMTRT
variable we use the
assign_no_ct()
function to map the MDRAW
variable to the CMTRT
variable:
tgt_dat <- assign_no_ct(
tgt_var = "CMTRT",
raw_dat = cm_raw,
raw_var = "MDRAW"
)
tgt_dat
## # A tibble: 14 × 4
## oak_id raw_source patient_number CMTRT
## <int> <chr> <int> <chr>
## 1 1 ConMed 375 BABY ASPIRIN
## 2 2 ConMed 375 CORTISPORIN
## 3 3 ConMed 376 ASPIRIN
## 4 4 ConMed 377 DIPHENHYDRAMINE HCL
## 5 5 ConMed 377 PARCETEMOL
## 6 6 ConMed 377 VOMIKIND
## 7 7 ConMed 377 ZENFLOX OZ
## 8 8 ConMed 378 AMITRYPTYLINE
## 9 9 ConMed 378 BENADRYL
## 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 11 ConMed 378 TETRACYCLINE
## 12 12 ConMed 379 BENADRYL
## 13 13 ConMed 379 SOMINEX
## 14 14 ConMed 379 ZQUILL
Then we create a conditioned data frame from the target data set
(tgt_dat
), meaning we create a conditioned data frame where
only rows with CMTRT
equal to "BENADRYL"
are
marked:
(cnd_tgt_dat <- condition_add(tgt_dat, CMTRT == "BENADRYL"))
## # A tibble: 14 × 4
## # Cond. tbl: 2/12/0
## oak_id raw_source patient_number CMTRT
## <int> <chr> <int> <chr>
## 1 F 1 ConMed 375 BABY ASPIRIN
## 2 F 2 ConMed 375 CORTISPORIN
## 3 F 3 ConMed 376 ASPIRIN
## 4 F 4 ConMed 377 DIPHENHYDRAMINE HCL
## 5 F 5 ConMed 377 PARCETEMOL
## 6 F 6 ConMed 377 VOMIKIND
## 7 F 7 ConMed 377 ZENFLOX OZ
## 8 F 8 ConMed 378 AMITRYPTYLINE
## 9 T 9 ConMed 378 BENADRYL
## 10 F 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 F 11 ConMed 378 TETRACYCLINE
## 12 T 12 ConMed 379 BENADRYL
## 13 F 13 ConMed 379 SOMINEX
## 14 F 14 ConMed 379 ZQUILL
Finally, we derive the CMGRPID
variable conditionally.
Using assign_no_ct()
, we derive CMGRPID
which
indicates the group ID for the medication, based on the conditioned
target data set:
derived_tgt_dat <- assign_no_ct(
tgt_dat = cnd_tgt_dat,
tgt_var = "CMGRPID",
raw_dat = cm_raw,
raw_var = "MDNUM"
)
derived_tgt_dat
## # A tibble: 14 × 5
## oak_id raw_source patient_number CMTRT CMGRPID
## <int> <chr> <int> <chr> <int>
## 1 1 ConMed 375 BABY ASPIRIN NA
## 2 2 ConMed 375 CORTISPORIN NA
## 3 3 ConMed 376 ASPIRIN NA
## 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
## 5 5 ConMed 377 PARCETEMOL NA
## 6 6 ConMed 377 VOMIKIND NA
## 7 7 ConMed 377 ZENFLOX OZ NA
## 8 8 ConMed 378 AMITRYPTYLINE NA
## 9 9 ConMed 378 BENADRYL 1
## 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
## 11 11 ConMed 378 TETRACYCLINE NA
## 12 12 ConMed 379 BENADRYL 1
## 13 13 ConMed 379 SOMINEX NA
## 14 14 ConMed 379 ZQUILL NA
Conditioned data frames in the sdtm.oak package provide a flexible way to perform conditional transformations on data sets. By marking specific rows for transformation, users can efficiently derive SDTM variables, ensuring that only relevant records are processed.