Reading SAS datasets into R. The data is not always as it seems!
ADaM
Technical
Author
Ben Straub
Published
July 10, 2023
Reading in SAS-based datasets (.sas7bdat or xpt) into R has users calling the R package haven. A typical call might invoke read_sas() or read_xpt() to bring in your source data to construct your ADaMs or SDTMs.
Unfortunately, while using haven the character blanks (missing data) found in a typical SAS-based dataset are left as blanks. These blanks will typically prove problematic while using functions like is.na in combination with dplyr::filter() to subset data. Check out Bayer’s SAS2R catalog: handling-of-missing-values for more discussion on missing values and NAs.
In the admiral package, we have built a simple function called convert_blanks_to_na() to help us quickly remedy this problem. You can supply an entire dataframe to this function and it will convert any character blanks to NA_character_
Loading Packages and Making Dummy Data
library(admiral)library(tibble)library(dplyr)df <-tribble(~USUBJID, ~RFICDTC,"01", "2000-01-01","02", "2001-01-01","03", "", # Here we have a character blank"04", "2001-01--","05", "2001---01","05", "", # Here we have a character blank)df
A simple call to this function can make your derivation life so much easier while working in R if working with SAS-based datasets. In admiral, we make use of this function at the start of all ADaM templates for common ADaM datasets. You can use the function use_ad_template() to get the full R script for the below ADaMs.
list_all_templates()
Existing ADaM templates in package 'admiral':
• ADAE
• ADCM
• ADEG
• ADEX
• ADLB
• ADLBHY
• ADMH
• ADPC
• ADPP
• ADPPK
• ADSL
• ADVS