Blanks and NAs

Reading SAS datasets into R. The data is not always as it seems!
ADaM
Technical
Author

Ben Straub

Published

July 10, 2023

Reading in SAS-based datasets (.sas7bdat or xpt) into R has users calling the R package haven. A typical call might invoke read_sas() or read_xpt() to bring in your source data to construct your ADaMs or SDTMs.

Unfortunately, while using haven the character blanks (missing data) found in a typical SAS-based dataset are left as blanks. These blanks will typically prove problematic while using functions like is.na in combination with dplyr::filter() to subset data. Check out Bayer’s SAS2R catalog: handling-of-missing-values for more discussion on missing values and NAs.

In the admiral package, we have built a simple function called convert_blanks_to_na() to help us quickly remedy this problem. You can supply an entire dataframe to this function and it will convert any character blanks to NA_character_

Loading Packages and Making Dummy Data

library(admiral)
library(tibble)
library(dplyr)

df <- tribble(
  ~USUBJID, ~RFICDTC,
  "01", "2000-01-01",
  "02", "2001-01-01",
  "03", "", # Here we have a character blank
  "04", "2001-01--",
  "05", "2001---01",
  "05", "", # Here we have a character blank
)

df
# A tibble: 6 × 2
  USUBJID RFICDTC     
  <chr>   <chr>       
1 01      "2000-01-01"
2 02      "2001-01-01"
3 03      ""          
4 04      "2001-01--" 
5 05      "2001---01" 
6 05      ""          

A simple conversion

df_na <- convert_blanks_to_na(df)

df_na
# A tibble: 6 × 2
  USUBJID RFICDTC   
  <chr>   <chr>     
1 01      2000-01-01
2 02      2001-01-01
3 03      <NA>      
4 04      2001-01-- 
5 05      2001---01 
6 05      <NA>      
df_na %>% filter(is.na(RFICDTC))
# A tibble: 2 × 2
  USUBJID RFICDTC
  <chr>   <chr>  
1 03      <NA>   
2 05      <NA>   

That’s it!

A simple call to this function can make your derivation life so much easier while working in R if working with SAS-based datasets. In admiral, we make use of this function at the start of all ADaM templates for common ADaM datasets. You can use the function use_ad_template() to get the full R script for the below ADaMs.

list_all_templates()
Existing ADaM templates in package 'admiral':
• ADAE
• ADCM
• ADEG
• ADEX
• ADLB
• ADLBHY
• ADMH
• ADPC
• ADPP
• ADPPK
• ADSL
• ADVS

Last updated

2024-11-15 19:40:48.541296

Details

Reuse

Citation

BibTeX citation:
@online{straub2023,
  author = {Straub, Ben},
  title = {Blanks and {NAs}},
  date = {2023-07-10},
  url = {https://pharmaverse.github.io/blog/posts/2023-07-10_blanks_and_nas/blanks_and_nas.html},
  langid = {en}
}
For attribution, please cite this work as:
Straub, Ben. 2023. “Blanks and NAs.” July 10, 2023. https://pharmaverse.github.io/blog/posts/2023-07-10_blanks_and_nas/blanks_and_nas.html.