10  tidyr exercises

Author

Zelos Zhu

10.1 Setup

library(tidyverse)
library(dplyr)
library(tidyr)

# load data
ex <- readRDS("data/ex.rds") 
dm <- readRDS("data/dm.rds") 
ds <- readRDS("data/ds.rds") 
suppds <- readRDS("data/suppds.rds") 

10.2 Pivoting with tidyr

Load the ex data-frame from admiral_ex and select the following variables:

  • USUBJID

  • EXTRT

  • VISIT

  • EXSTDTC

Show solution
ex %>% 
  select(USUBJID, EXTRT, VISIT, EXSTDTC)
# A tibble: 591 × 4
   USUBJID     EXTRT      VISIT    EXSTDTC   
   <chr>       <chr>      <chr>    <chr>     
 1 01-701-1015 PLACEBO    BASELINE 2014-01-02
 2 01-701-1015 PLACEBO    WEEK 2   2014-01-17
 3 01-701-1015 PLACEBO    WEEK 24  2014-06-19
 4 01-701-1023 PLACEBO    BASELINE 2012-08-05
 5 01-701-1023 PLACEBO    WEEK 2   2012-08-28
 6 01-701-1028 XANOMELINE BASELINE 2013-07-19
 7 01-701-1028 XANOMELINE WEEK 2   2013-08-02
 8 01-701-1028 XANOMELINE WEEK 24  2014-01-07
 9 01-701-1033 XANOMELINE BASELINE 2014-03-18
10 01-701-1034 XANOMELINE BASELINE 2014-07-01
# … with 581 more rows

Using pivot_wider() create a table that would shaped this way

USUBJID EXTRT BASELINE WEEK 2 WEEK 24
Show solution
ex %>% 
  select(USUBJID, EXTRT, VISIT, EXSTDTC) %>% 
  pivot_wider(names_from = "VISIT", values_from = "EXSTDTC")
# A tibble: 254 × 5
   USUBJID     EXTRT      BASELINE   `WEEK 2`   `WEEK 24` 
   <chr>       <chr>      <chr>      <chr>      <chr>     
 1 01-701-1015 PLACEBO    2014-01-02 2014-01-17 2014-06-19
 2 01-701-1023 PLACEBO    2012-08-05 2012-08-28 <NA>      
 3 01-701-1028 XANOMELINE 2013-07-19 2013-08-02 2014-01-07
 4 01-701-1033 XANOMELINE 2014-03-18 <NA>       <NA>      
 5 01-701-1034 XANOMELINE 2014-07-01 2014-07-16 2014-12-18
 6 01-701-1047 PLACEBO    2013-02-12 2013-02-26 <NA>      
 7 01-701-1097 XANOMELINE 2014-01-01 2014-01-16 2014-06-19
 8 01-701-1111 XANOMELINE 2012-09-07 <NA>       <NA>      
 9 01-701-1115 XANOMELINE 2012-11-30 2012-12-14 <NA>      
10 01-701-1118 PLACEBO    2014-03-12 2014-03-27 2014-08-28
# … with 244 more rows

Load the dm data-frame from admiral_dm and select the following variables:

  • USUBJID

  • RACE

  • SEX

Show solution
dm %>% 
  select(USUBJID, RACE, SEX)
# A tibble: 306 × 3
   USUBJID     RACE  SEX  
   <chr>       <chr> <chr>
 1 01-701-1015 WHITE F    
 2 01-701-1023 WHITE M    
 3 01-701-1028 WHITE M    
 4 01-701-1033 WHITE M    
 5 01-701-1034 WHITE F    
 6 01-701-1047 WHITE F    
 7 01-701-1057 WHITE F    
 8 01-701-1097 WHITE M    
 9 01-701-1111 WHITE F    
10 01-701-1115 WHITE M    
# … with 296 more rows

Using pivot_longer() create a table that would shaped this way

USUBJID VAR VAL
1001 RACE WHITE
1001 SEX M
Show solution
dm %>% 
  select(USUBJID, RACE, SEX) %>% 
  pivot_longer(cols = c(RACE, SEX),
               names_to = "VAR",
               values_to = "VAL")
# A tibble: 612 × 3
   USUBJID     VAR   VAL  
   <chr>       <chr> <chr>
 1 01-701-1015 RACE  WHITE
 2 01-701-1015 SEX   F    
 3 01-701-1023 RACE  WHITE
 4 01-701-1023 SEX   M    
 5 01-701-1028 RACE  WHITE
 6 01-701-1028 SEX   M    
 7 01-701-1033 RACE  WHITE
 8 01-701-1033 SEX   M    
 9 01-701-1034 RACE  WHITE
10 01-701-1034 SEX   F    
# … with 602 more rows

10.3 Joining using dplyr

Load the ds data-frame from admiral_ds and suppds data-frame from admiral_suppds. Prior to joining the two datasets together, we may need to do some cleaning of the data on suppds.

  • Filter IDVAR for "DSSEQ"

  • Mutate IDVARVAL from type character to type numeric.

  • Select USUBJID IDVARVAL QNAM QLABEL QVAL

Show solution
suppds <- suppds %>% 
  filter(IDVAR == "DSSEQ") %>% 
  mutate(IDVARVAL = as.numeric(IDVARVAL)) %>% 
  select(USUBJID, IDVARVAL, QNAM, QLABEL, QVAL)

suppds
# A tibble: 3 × 5
  USUBJID     IDVARVAL QNAM    QLABEL                          QVAL 
  <chr>          <dbl> <chr>   <chr>                           <chr>
1 01-703-1175        2 ENTCRIT PROTOCOL ENTRY CRITERIA NOT MET 16   
2 01-705-1382        2 ENTCRIT PROTOCOL ENTRY CRITERIA NOT MET 25   
3 01-708-1372        3 ENTCRIT PROTOCOL ENTRY CRITERIA NOT MET 16   

Join the two tables together using USUBJID and DSSEQ as the key joining variables.

Show solution
ds %>% 
  left_join(suppds, by = c("USUBJID" = "USUBJID", "DSSEQ" = "IDVARVAL"))
# A tibble: 850 × 16
   STUDYID  DOMAIN USUBJID DSSEQ DSSPID DSTERM DSDECOD DSCAT VISIT…¹ VISIT DSDTC
   <chr>    <chr>  <chr>   <dbl> <chr>  <chr>  <chr>   <chr>   <dbl> <chr> <chr>
 1 CDISCPI… DS     01-701…     1 <NA>   RANDO… RANDOM… PROT…       3 BASE… 2014…
 2 CDISCPI… DS     01-701…     2 <NA>   PROTO… COMPLE… DISP…      13 WEEK… 2014…
 3 CDISCPI… DS     01-701…     3 <NA>   FINAL… FINAL … OTHE…      13 WEEK… 2014…
 4 CDISCPI… DS     01-701…     1 <NA>   RANDO… RANDOM… PROT…       3 BASE… 2012…
 5 CDISCPI… DS     01-701…     2 24     ADVER… ADVERS… DISP…       5 WEEK… 2012…
 6 CDISCPI… DS     01-701…     3 <NA>   FINAL… FINAL … OTHE…       5 WEEK… 2012…
 7 CDISCPI… DS     01-701…     4 <NA>   FINAL… FINAL … OTHE…     201 RETR… 2013…
 8 CDISCPI… DS     01-701…     1 <NA>   RANDO… RANDOM… PROT…       3 BASE… 2013…
 9 CDISCPI… DS     01-701…     2 <NA>   PROTO… COMPLE… DISP…      13 WEEK… 2014…
10 CDISCPI… DS     01-701…     3 <NA>   FINAL… FINAL … OTHE…      13 WEEK… 2014…
# … with 840 more rows, 5 more variables: DSSTDTC <chr>, DSSTDY <dbl>,
#   QNAM <chr>, QLABEL <chr>, QVAL <chr>, and abbreviated variable name
#   ¹​VISITNUM