Skip to contents
  • hardcode_no_ct() maps a hardcoded value to a target SDTM variable that has no terminology restrictions.

  • hardcode_ct() maps a hardcoded value to a target SDTM variable with controlled terminology recoding.

Usage

hardcode_no_ct(
  tgt_dat = NULL,
  tgt_val,
  raw_dat,
  raw_var,
  tgt_var,
  id_vars = oak_id_vars()
)

hardcode_ct(
  tgt_dat = NULL,
  tgt_val,
  raw_dat,
  raw_var,
  tgt_var,
  ct_spec,
  ct_clst,
  id_vars = oak_id_vars()
)

Arguments

tgt_dat

Target dataset: a data frame to be merged against raw_dat by the variables indicated in id_vars. This parameter is optional, see section Value for how the output changes depending on this argument value.

tgt_val

The target SDTM value to be hardcoded into the variable indicated in tgt_var.

raw_dat

The raw dataset (dataframe); must include the variables passed in id_vars and raw_var.

raw_var

The raw variable: a single string indicating the name of the raw variable in raw_dat.

tgt_var

The target SDTM variable: a single string indicating the name of variable to be derived.

id_vars

Key variables to be used in the join between the raw dataset (raw_dat) and the target data set (raw_dat).

ct_spec

Study controlled terminology specification: a dataframe with a minimal set of columns, see ct_spec_vars() for details. This parameter is optional, if left as NULL no controlled terminology recoding is applied.

ct_clst

A codelist code indicating which subset of the controlled terminology to apply in the derivation. This parameter is optional, if left as NULL, all possible recodings in ct_spec are attempted.

Value

The returned data set depends on the value of tgt_dat:

  • If no target dataset is supplied, meaning that tgt_dat defaults to NULL, then the returned data set is raw_dat, selected for the variables indicated in id_vars, and a new extra column: the derived variable, as indicated in tgt_var.

  • If the target dataset is provided, then it is merged with the raw data set raw_dat by the variables indicated in id_vars, with a new column: the derived variable, as indicated in tgt_var.

Examples

md1 <-
  tibble::tribble(
    ~oak_id, ~raw_source, ~patient_number, ~MDRAW,
    1L,      "MD1",       101L,            "BABY ASPIRIN",
    2L,      "MD1",       102L,            "CORTISPORIN",
    3L,      "MD1",       103L,            NA_character_,
    4L,      "MD1",       104L,            "DIPHENHYDRAMINE HCL"
  )

# Derive a new variable `CMCAT` by overwriting `MDRAW` with the
# hardcoded value "GENERAL CONCOMITANT MEDICATIONS".
hardcode_no_ct(
  tgt_val = "GENERAL CONCOMITANT MEDICATIONS",
  raw_dat = md1,
  raw_var = "MDRAW",
  tgt_var = "CMCAT"
)
#> # A tibble: 4 × 4
#>   oak_id raw_source patient_number CMCAT                          
#>    <int> <chr>               <int> <chr>                          
#> 1      1 MD1                   101 GENERAL CONCOMITANT MEDICATIONS
#> 2      2 MD1                   102 GENERAL CONCOMITANT MEDICATIONS
#> 3      3 MD1                   103 NA                             
#> 4      4 MD1                   104 GENERAL CONCOMITANT MEDICATIONS

cm_inter <-
  tibble::tribble(
    ~oak_id, ~raw_source, ~patient_number, ~CMTRT,                ~CMINDC,
    1L,      "MD1",       101L,            "BABY ASPIRIN",        NA,
    2L,      "MD1",       102L,            "CORTISPORIN",         "NAUSEA",
    3L,      "MD1",       103L,            "ASPIRIN",             "ANEMIA",
    4L,      "MD1",       104L,            "DIPHENHYDRAMINE HCL", "NAUSEA",
    5L,      "MD1",       105L,            "PARACETAMOL",         "PYREXIA"
  )

# Derive a new variable `CMCAT` by overwriting `MDRAW` with the
# hardcoded value "GENERAL CONCOMITANT MEDICATIONS" with a prior join to
# `target_dataset`.
hardcode_no_ct(
  tgt_dat = cm_inter,
  tgt_val = "GENERAL CONCOMITANT MEDICATIONS",
  raw_dat = md1,
  raw_var = "MDRAW",
  tgt_var = "CMCAT"
)
#> # A tibble: 5 × 6
#>   oak_id raw_source patient_number CMTRT               CMINDC  CMCAT            
#>    <int> <chr>               <int> <chr>               <chr>   <chr>            
#> 1      1 MD1                   101 BABY ASPIRIN        NA      GENERAL CONCOMIT…
#> 2      2 MD1                   102 CORTISPORIN         NAUSEA  GENERAL CONCOMIT…
#> 3      3 MD1                   103 ASPIRIN             ANEMIA  NA               
#> 4      4 MD1                   104 DIPHENHYDRAMINE HCL NAUSEA  GENERAL CONCOMIT…
#> 5      5 MD1                   105 PARACETAMOL         PYREXIA NA               

# Controlled terminology specification
(ct_spec <- read_ct_spec_example("ct-01-cm"))
#> # A tibble: 33 × 8
#>    codelist_code term_code CodedData term_value collected_value    
#>    <chr>         <chr>     <chr>     <chr>      <chr>              
#>  1 C71113        C25473    QD        QD         QD (Every Day)     
#>  2 C71113        C64496    BID       BID        BID (Twice a Day)  
#>  3 C71113        C64499    PRN       PRN        PRN (As Needed)    
#>  4 C71113        C64516    Q2H       Q2H        Q2H (Every 2 Hours)
#>  5 C71113        C64530    QID       QID        QID (4 Times a Day)
#>  6 C66726        C25158    CAPSULE   CAPSULE    Capsule            
#>  7 C66726        C25394    PILL      PILL       Pill               
#>  8 C66726        C29167    LOTION    LOTION     Lotion             
#>  9 C66726        C42887    AEROSOL   AEROSOL    Aerosol            
#> 10 C66726        C42944    INHALANT  INHALANT   Inhalant           
#> # ℹ 23 more rows
#> # ℹ 3 more variables: term_preferred_term <chr>, term_synonyms <chr>,
#> #   raw_codelist <chr>

# Hardcoding of `CMCAT` with the value `"GENERAL CONCOMITANT MEDICATIONS"`
# involving terminology recoding. `NA` values in `MDRAW` are preserved in
# `CMCAT`.
hardcode_ct(
  tgt_dat = cm_inter,
  tgt_var = "CMCAT",
  raw_dat = md1,
  raw_var = "MDRAW",
  tgt_val = "GENERAL CONCOMITANT MEDICATIONS",
  ct_spec = ct_spec,
  ct_clst = "C66729"
)
#>  These terms could not be mapped per the controlled terminology: "GENERAL CONCOMITANT MEDICATIONS".
#> # A tibble: 5 × 6
#>   oak_id raw_source patient_number CMTRT               CMINDC  CMCAT            
#>    <int> <chr>               <int> <chr>               <chr>   <chr>            
#> 1      1 MD1                   101 BABY ASPIRIN        NA      GENERAL CONCOMIT…
#> 2      2 MD1                   102 CORTISPORIN         NAUSEA  GENERAL CONCOMIT…
#> 3      3 MD1                   103 ASPIRIN             ANEMIA  NA               
#> 4      4 MD1                   104 DIPHENHYDRAMINE HCL NAUSEA  GENERAL CONCOMIT…
#> 5      5 MD1                   105 PARACETAMOL         PYREXIA NA               


# Variables are derived in sequence from multiple input sources.
# For each target variable, only missing (`NA`) values are filled
# during each step—previously assigned (non-missing) values are retained.

cm_raw <-
  tibble::tibble(
    oak_id = 1:4,
    raw_source = "cm_raw",
    patient_number = 370 + oak_id,
    PATNUM = patient_number,
    IT.CMTRT = c("BABY ASPIRIN", "CORTISPORIN", NA, NA),
    IT.CMTRTOTH = c("Other Treatment - ", NA, "Other Treatment - Baby Aspirin", NA)
  )

cm_raw
#> # A tibble: 4 × 6
#>   oak_id raw_source patient_number PATNUM IT.CMTRT     IT.CMTRTOTH              
#>    <int> <chr>               <dbl>  <dbl> <chr>        <chr>                    
#> 1      1 cm_raw                371    371 BABY ASPIRIN "Other Treatment - "     
#> 2      2 cm_raw                372    372 CORTISPORIN   NA                      
#> 3      3 cm_raw                373    373 NA           "Other Treatment - Baby …
#> 4      4 cm_raw                374    374 NA            NA                      

# Hardcoding of values of `CMCAT` is based firstly on the presence of missing
# values (`NA`) in `IT.CMTRT` and only secondly on `IT.CMTRTOTH`.
hardcode_no_ct(
  tgt_val = "General Concomitant Medications",
  raw_dat = cm_raw,
  raw_var = "IT.CMTRT",
  tgt_var = "CMCAT"
) |>
  hardcode_no_ct(
    tgt_val = "Other General Concomitant Medications",
    raw_dat = cm_raw,
    raw_var = "IT.CMTRTOTH",
    tgt_var = "CMCAT"
  )
#> # A tibble: 4 × 4
#>   oak_id raw_source patient_number CMCAT                                
#>    <int> <chr>               <dbl> <chr>                                
#> 1      1 cm_raw                371 General Concomitant Medications      
#> 2      2 cm_raw                372 General Concomitant Medications      
#> 3      3 cm_raw                373 Other General Concomitant Medications
#> 4      4 cm_raw                374 NA                                   

# Note that hardcoding application is reversed in this example, this impacts
# the result.
hardcode_no_ct(
  tgt_val = "Other General Concomitant Medications",
  raw_dat = cm_raw,
  raw_var = "IT.CMTRTOTH",
  tgt_var = "CMCAT"
) |>
  hardcode_no_ct(
    tgt_val = "General Concomitant Medications",
    raw_dat = cm_raw,
    raw_var = "IT.CMTRT",
    tgt_var = "CMCAT"
  )
#> # A tibble: 4 × 4
#>   oak_id raw_source patient_number CMCAT                                
#>    <int> <chr>               <dbl> <chr>                                
#> 1      1 cm_raw                371 Other General Concomitant Medications
#> 2      2 cm_raw                372 General Concomitant Medications      
#> 3      3 cm_raw                373 Other General Concomitant Medications
#> 4      4 cm_raw                374 NA