Introduction
This article describes creating an ADVFQ ADaM with Visual Functioning Questionnaire data for ophthalmology endpoints. It is to be used in conjunction with the article on creating a BDS dataset from SDTM. As such, derivations and processes that are not specific to ADVFQ are mostly absent, and the user is invited to consult the aforementioned article for guidance.
The full, open-source VFQ questionnaire can be accessed here.
Note: All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified. Also, some of the example datasets in this vignette contain more records than are displayed by default, but the number of displayed records can be expanded using the selectors at the bottom.
Dataset Contents
admiralophtha suggests to populate ADVFQ solely with VFQ-related records. Any other questionnaire data should be placed in separate datasets (e.g. ADQS).
The records in ADVFQ can be categorized in four groups:
- The original/raw records coming from the VFQ itself. These include both the Base items in the standard VFQ-25 and the Optional items which can be added to form the VFQ-39. Importantly, not all of the items are measured on the same scale.
- The transformed records, which are in a one-to-one correspondence with the original records and serve to recode the latter on the same 0 - 100 scale.
- The composite scores by category, which are means of all the transformed records from within a category, e.g. “Near activities score”. These composite scores can be calculated including or excluding the Optional items.
- The overall composite scores, which are means of the composite scores, again including or excluding the Optional items.
Programming Workflow
- Reading In Data and Setting Up Lookup Tables
- Initial Set Up and Mapping the Raw Records
- Deriving the Transformed Parameters
- Deriving the Composite Parameters
- Further Derivations of Standard BDS Variables
- Example Script
Reading In Data and Setting Up Lookup Tables
To start, all datasets needed for the creation of the VFQ analysis
dataset should be read into the environment. For the purpose of
demonstration we shall use the pharmaversesdtm
qs_ophtha and the admiral ADSL test
datasets. Note that the former only contains VFQ records.
Next, it will prove useful to set up lookup tables ahead of time for
the PARAM/PARAMCD and PARCATy
variables for the original, transformed and composite parameters. This
is so that when the latter two are derived later in the script, we can
simply assign PARAMCD (or one of the PARCATys)
and then perform a merge with the lookup tables to associate to them the
remaining variables as well. Please note that the mappings
described through these lookup tables may vary from company to company -
this is just what admiralophtha suggests.
For convenience, we suggest setting up three separate lookup tables.
You can peruse how the tables are structured below - for the full set up
code, please visit the ad_advfq.R
template. There is a special importance for the contents of
PARCAT4 since this determines the categories which are then
averaged within the composite
parameters. You can differentiate the Base Items from the Optional
ones using PARCAT5, and the original, transformed and
composite parameters using PARCAT2.
Initial Set Up and Mapping the Raw Records
We can now start setting up ADVFQ by taking qs and
merging on some ADSL variables which are needed for BDS derivations
later down the line. Note that there is no need to merge on the whole
ADSL dataset immediately as in so doing we would be needlessly
increasing the size of the working dataset; the rest of the variables
can be merged on at the end.
adsl_vars <- exprs(TRTSDT, TRTEDT, TRT01A, TRT01P)
advfq <- qs %>%
derive_vars_merged(
dataset_add = adsl,
new_vars = adsl_vars,
by_vars = get_admiral_option("subject_keys")
)Now we can derive the analysis date (ADT) and analysis
relative day (ADY) variables. These derivations are
study-specific and so the ones below are just examples - the user is
again invited to consult the vignette on creating
a BDS dataset from SDTM for details on this topic.
advfq <- advfq %>%
derive_vars_dt(
new_vars_prefix = "A",
dtc = QSDTC
) %>%
derive_vars_dy(
reference_date = TRTSDT,
source_vars = exprs(ADT)
)Next, we can assign PARAMCD for the original parameters
by merging with the param_lookup_original table we set up
earlier. Using derive_vars_merged_lookup() allows us to get
console-level feedback that all the QSTESTCDs have been
mapped. We only add PARAMCD for now, and will derive
PARCATy and PARAM later - again to avoid
carrying forward too many variables.
We can also derive AVAL, AVALC and
BASETYPE for the original records with a simple
mutate() statement, and then AVISIT and
AVISITN with study-specific logic.
advfq <- advfq %>%
## Add PARAMCD for original parameters only - PARCATy and PARAM will be added later
derive_vars_merged_lookup(
dataset_add = param_lookup_original,
new_vars = exprs(PARAMCD),
by_vars = exprs(QSTESTCD)
) %>%
mutate(
AVAL = QSSTRESN,
AVALC = QSORRES,
BASETYPE = "LAST PERIOD 01"
) %>%
mutate(
AVISIT = case_when(
!is.na(VISIT) ~ str_to_title(VISIT),
TRUE ~ NA_character_
),
AVISITN = case_when(
AVISIT == "Baseline" ~ 1,
AVISIT == "Week 12" ~ 12,
AVISIT == "Week 24" ~ 24,
TRUE ~ NA
),
)
#> All `QSTESTCD` are mapped.Here’s what the data for the “Near Activities” and “Distance Activities” questions looks like for one patient’s first two visits:
Deriving the Transformed Parameters
Now we are ready to derive the transformed parameters. Excluding 15C
(see later down this section), all of these are derived by re-scaling a
single original record. As such, the prototypical code which can be used
to derive them is a call to derive_extreme_records() which
is structured as follows:
derive_extreme_records(
dataset = advfq,
dataset_add = advfq,
filter_add = QSTESTCD == "VFQ1xx" & !is.na(AVAL),
set_values_to = exprs(
AVAL = transform_range(
source = AVAL,
source_range = c(1, 5), # or some other range, e.g. c(1, 6), depending on the parameter
target_range = c(0, 100),
flip_direction = TRUE # or FALSE, depending on the parameter
),
PARAMCD = "QRxx"
)
)The function transform_range() is a helper function from
admiral which performs the re-scaling. You will need to
specify the source_range (i.e. the range of possible values
for the original record) and whether or not to
flip_direction (i.e. whether higher values of the original
record correspond to better or worse outcomes). Note that both the
source_range and flip_direction will
change across parameters - for instance, the source range is
c(1, 5) for PARAMCD == "VFQ101" but
c(1, 6) for PARAMCD == "VFQ102". As such it is
useful to group all parameters (excluding 15C, for which a different
approach will be needed) based on which transformation they require; in
so doing the transformations for each group can be done together.
range1to5_flip_params <- c(
"VFQ101", "VFQ103", "VFQ104", "VFQ105", "VFQ106", "VFQ107", "VFQ108",
"VFQ109", "VFQ110", "VFQ111", "VFQ112", "VFQ113", "VFQ114", "VFQ116",
"VFQ116A", "VFQ1A03", "VFQ1A04", "VFQ1A05", "VFQ1A06", "VFQ1A07",
"VFQ1A08", "VFQ1A09"
)
range1to6_flip_params <- c("VFQ102")
range1to5_noflip_params <- c(
"VFQ117", "VFQ118", "VFQ119", "VFQ120", "VFQ121", "VFQ122", "VFQ123",
"VFQ124", "VFQ125", "VFQ1A11A", "VFQ1A11B", "VFQ1A12", "VFQ1A13"
)
range0to10_noflip_params <- c("VFQ1A01", "VFQ1A02")For question 15C we have to be a little more careful as the
derivation depends on whether or not question 15C was asked at that
visit. If it was, then we can derive the transformed record as normal.
However, if it wasn’t, then we have to check the response to question
15B - if the response to 15B indicates that the patient has never driven
or given up driving (i.e. QSSTRESN == 1), then we set the
transformed record to 0; otherwise, we do not derive a transformed
record for that visit. To do this, we can set up a temporary flag
variable using derive_var_merged_exist_flag() to identify
visits where question 15C was not asked, and then use that flag in the
derivation of the transformed record for 15C.
advfq <- advfq %>%
derive_var_merged_exist_flag(
dataset_add = advfq,
by_vars = exprs(!!!adsl_vars, ADT, ADY),
new_var = TEMP_VFQ115C_FL,
condition = QSTESTCD == "VFQ115C",
true_value = "Y",
false_value = "N",
missing_value = "M"
)We can then derive the transformed records, including the special
handling for 15C, in a single call to call_derivation(),
since each call to derive_extreme_records() contains the
same assignment of dataset, dataset_add and
keep_source_vars, then only differs in the
filter_add and set_values_to arguments. Each
list passed to variable_params performs the transformation
for one of the groups of parameters set up above, including dynamical
generation of the PARAMCD, though 15C is done separately.
Note that we set keep_source_vars = adsl_vars to ensure
SDTM records do not get populated for derived parameters.
advfq <- advfq %>%
call_derivation(
derivation = derive_extreme_records,
dataset = .,
dataset_add = .,
keep_source_vars = c(
get_admiral_option("subject_keys"),
exprs(PARAMCD, AVISIT, AVISITN, ADT, ADY),
adsl_vars
),
variable_params = list(
params(
filter_add = QSTESTCD %in% range1to5_flip_params & !is.na(AVAL),
set_values_to = exprs(
AVAL = transform_range(
source = AVAL,
source_range = c(1, 5),
target_range = c(0, 100),
flip_direction = TRUE
),
PARAMCD = str_replace(QSTESTCD, "VFQ1", "QR")
)
),
params(
filter_add = QSTESTCD %in% range1to6_flip_params & !is.na(AVAL),
set_values_to = exprs(
AVAL = transform_range(
source = AVAL,
source_range = c(1, 6),
target_range = c(0, 100),
flip_direction = TRUE
),
PARAMCD = str_replace(QSTESTCD, "VFQ1", "QR")
)
),
params(
filter_add = QSTESTCD %in% range1to5_noflip_params & !is.na(AVAL),
set_values_to = exprs(
AVAL = transform_range(
source = AVAL,
source_range = c(1, 5),
target_range = c(0, 100),
flip_direction = FALSE
),
PARAMCD = str_replace(QSTESTCD, "VFQ1", "QR")
)
),
params(
filter_add = QSTESTCD %in% range0to10_noflip_params & !is.na(AVAL),
set_values_to = exprs(
AVAL = transform_range(
source = AVAL,
source_range = c(0, 10),
target_range = c(0, 100),
flip_direction = FALSE
),
PARAMCD = str_replace(QSTESTCD, "VFQ1", "QR")
)
),
# For QR15C, do it in two parts
# first in the case where QSTESTCD == "VFQ115C" is present at that visit
params(
filter_add = QSTESTCD == "VFQ115C" & !is.na(AVAL),
set_values_to = exprs(
AVAL = transform_range(
source = AVAL,
source_range = c(1, 5),
target_range = c(0, 100),
flip_direction = TRUE
),
PARAMCD = "QR15C"
)
),
# second in the case where QSTESTCD == "VFQ115C" is not present at that visit
params(
filter_add = TEMP_VFQ115C_FL == "N" & QSTESTCD == "VFQ115B" & AVAL == 1,
set_values_to = exprs(
AVAL = 0,
PARAMCD = "QR15C"
)
)
)
) %>%
select(-TEMP_VFQ115C_FL)We can then add the PARAM and PARCATy
variables for both the original and transformed records by merging with
the relevant lookup tables.
advfq <- advfq %>%
derive_vars_merged_lookup(
dataset_add = rbind(
param_lookup_original %>% select(-QSTESTCD),
param_lookup_transformed
),
by_vars = exprs(PARAMCD)
)
#> All `PARAMCD` are mapped.Here’s what some of the transformed parameters and original records look like alongside each other for one patient’s first two visits:
Deriving the Composite Parameters
Composite Scores by Category
With the now-derived transformed parameters (identifiable through
PARCAT2 == "Transformed - Original Items"), we have every
question measured on the same scale and can derive the composite
parameters. These are essentially just means across categories of
questions, with the category being determined by the contents of
PARCAT4. The VFQ guidelines dictate that for each category,
two different composite parameters should be derived: one including just
the Base items of the questionnaire
(PARCAT5 == "Base Item") and the other one also including
any Optional items. As such, below we derive the composite parameters in
two stages, first constructing advfq_qsb for the Base
item-only composite means, and then advfq_qso for the means
including all items.
Note that the dataset argument is not passed to
derive_summary_records(), meaning the output datasets
contain only the new composite parameters. Also, this time we use
PARCAT4 as a by-variable, meaning that the new composite
scores are assigned the same value of PARCAT4
(e.g. “Distance Activities”) as the records that were used to constitute
them. Then, we can then use derive_vars_merged_lookup() to
add on the PARCATy (excluding PARCAT4) and
PARAM/PARAMCD variables for these new records.
At the end, we append the new records to the existing advfq
using bind_rows().
advfq_qsb <- derive_summary_records(
dataset_add = advfq,
filter_add = PARCAT2 == "Transformed - Original Items" & PARCAT5 == "Base Item" & !is.na(AVAL),
by_vars = c(
get_admiral_option("subject_keys"),
exprs(!!!adsl_vars, AVISIT, AVISITN, ADT, ADY, PARCAT4)
),
set_values_to = exprs(AVAL = mean(AVAL))
) %>%
derive_vars_merged_lookup(
dataset_add = filter(param_lookup_composite, str_starts(PARAMCD, "QSB")),
new_vars = exprs(PARAMCD, PARAM, PARCAT1, PARCAT2, PARCAT3, PARCAT5),
by_vars = exprs(PARCAT4)
)
#> All `PARCAT4` are mapped.
advfq_qso <- derive_summary_records(
dataset_add = advfq,
filter_add = PARCAT2 == "Transformed - Original Items" & !is.na(AVAL),
by_vars = c(
get_admiral_option("subject_keys"),
exprs(!!!adsl_vars, AVISIT, AVISITN, ADT, ADY, PARCAT4)
),
set_values_to = exprs(AVAL = mean(AVAL))
) %>%
derive_vars_merged_lookup(
dataset_add = filter(param_lookup_composite, str_starts(PARAMCD, "QSO")),
new_vars = exprs(PARAMCD, PARAM, PARCAT1, PARCAT2, PARCAT3, PARCAT5),
by_vars = exprs(PARCAT4)
)
#> All `PARCAT4` are mapped.
advfq <- bind_rows(advfq, advfq_qsb, advfq_qso)Here’s what the Base item composite scores by category look like for the “Near Activities” and “Distance Activities” categories for one patient’s first two visits:
Overall Composite Scores
Finally, for the overall composite scores, we take all the newly-derived composite parameters, excluding “General Health”, and create:
- An averaged record (
PARAMCD == "QBCSCORE") using the composite scores that were calculated with Base items only; - An averaged record (
PARAMCD == "QOCSCORE") using the composite scores that were calculated with Base and Optional items.
As was done for the transformed
parameters, we do this in a single call to
call_derivation(), since the only difference between the
two derivations is the filter_add and
set_values_to arguments.
advfq <- advfq %>%
call_derivation(
derivation = derive_summary_records,
dataset_add = advfq,
by_vars = c(
get_admiral_option("subject_keys"),
exprs(!!!adsl_vars, AVISIT, AVISITN, ADT, ADY)
),
variable_params = list(
params(
# Use Base Items only
filter_add = PARCAT5 == "VFQ-25" & str_sub(PARAMCD, 1, 3) == "QSB" &
PARCAT4 != "General Health" & !is.na(AVAL),
set_values_to = exprs(
AVAL = mean(AVAL),
PARAMCD = "QBCSCORE"
)
),
params(
# Use optional items items only
filter_add = PARCAT5 == "VFQ-39" & str_sub(PARAMCD, 1, 3) == "QSO" &
PARCAT4 != "General Health" & !is.na(AVAL),
set_values_to = exprs(
AVAL = mean(AVAL),
PARAMCD = "QOCSCORE"
)
)
)
)We then assign PARAM/PARAMCD and
PARCATy for these new overall composite records by merging
with the relevant lookup table, and append them to
advfq.
advfq <- advfq %>%
filter(str_detect(PARAMCD, "SCORE")) %>%
select(-PARAM, -starts_with("PARCAT")) %>%
derive_vars_merged_lookup(
dataset_add = param_lookup_composite,
new_vars = exprs(PARAM, PARCAT1, PARCAT2, PARCAT3, PARCAT4, PARCAT5),
by_vars = exprs(PARAMCD)
) %>%
rbind(advfq %>% filter(!str_detect(PARAMCD, "SCORE")))
#> All `PARAMCD` are mapped.Here’s what the overall composite records using the Base and Optional items for one patient’s baseline visit:
Further Derivations of Standard BDS Variables
The user is invited to consult the article on creating a BDS dataset from SDTM to learn how to add standard BDS variables to ADVFQ.
Example Script
| ADaM | Sample Code |
|---|---|
| ADVFQ | ad_advfq.R |
