library(tidyverse)
library(dplyr)
library(tidyr)
# load data
<- readRDS("data/ex.rds")
ex <- readRDS("data/dm.rds")
dm <- readRDS("data/ds.rds")
ds <- readRDS("data/suppds.rds") suppds
10 tidyr exercises
10.1 Setup
10.2 Pivoting with tidyr
Load the ex
data-frame from admiral_ex
and select the following variables:
USUBJID
EXTRT
VISIT
EXSTDTC
Show solution
%>%
ex select(USUBJID, EXTRT, VISIT, EXSTDTC)
# A tibble: 591 × 4
USUBJID EXTRT VISIT EXSTDTC
<chr> <chr> <chr> <chr>
1 01-701-1015 PLACEBO BASELINE 2014-01-02
2 01-701-1015 PLACEBO WEEK 2 2014-01-17
3 01-701-1015 PLACEBO WEEK 24 2014-06-19
4 01-701-1023 PLACEBO BASELINE 2012-08-05
5 01-701-1023 PLACEBO WEEK 2 2012-08-28
6 01-701-1028 XANOMELINE BASELINE 2013-07-19
7 01-701-1028 XANOMELINE WEEK 2 2013-08-02
8 01-701-1028 XANOMELINE WEEK 24 2014-01-07
9 01-701-1033 XANOMELINE BASELINE 2014-03-18
10 01-701-1034 XANOMELINE BASELINE 2014-07-01
# … with 581 more rows
Using pivot_wider()
create a table that would shaped this way
USUBJID | EXTRT | BASELINE | WEEK 2 | WEEK 24 |
---|---|---|---|---|
… | … | … | … | … |
Show solution
%>%
ex select(USUBJID, EXTRT, VISIT, EXSTDTC) %>%
pivot_wider(names_from = "VISIT", values_from = "EXSTDTC")
# A tibble: 254 × 5
USUBJID EXTRT BASELINE `WEEK 2` `WEEK 24`
<chr> <chr> <chr> <chr> <chr>
1 01-701-1015 PLACEBO 2014-01-02 2014-01-17 2014-06-19
2 01-701-1023 PLACEBO 2012-08-05 2012-08-28 <NA>
3 01-701-1028 XANOMELINE 2013-07-19 2013-08-02 2014-01-07
4 01-701-1033 XANOMELINE 2014-03-18 <NA> <NA>
5 01-701-1034 XANOMELINE 2014-07-01 2014-07-16 2014-12-18
6 01-701-1047 PLACEBO 2013-02-12 2013-02-26 <NA>
7 01-701-1097 XANOMELINE 2014-01-01 2014-01-16 2014-06-19
8 01-701-1111 XANOMELINE 2012-09-07 <NA> <NA>
9 01-701-1115 XANOMELINE 2012-11-30 2012-12-14 <NA>
10 01-701-1118 PLACEBO 2014-03-12 2014-03-27 2014-08-28
# … with 244 more rows
Load the dm
data-frame from admiral_dm
and select the following variables:
USUBJID
RACE
SEX
Show solution
%>%
dm select(USUBJID, RACE, SEX)
# A tibble: 306 × 3
USUBJID RACE SEX
<chr> <chr> <chr>
1 01-701-1015 WHITE F
2 01-701-1023 WHITE M
3 01-701-1028 WHITE M
4 01-701-1033 WHITE M
5 01-701-1034 WHITE F
6 01-701-1047 WHITE F
7 01-701-1057 WHITE F
8 01-701-1097 WHITE M
9 01-701-1111 WHITE F
10 01-701-1115 WHITE M
# … with 296 more rows
Using pivot_longer()
create a table that would shaped this way
USUBJID | VAR | VAL |
---|---|---|
1001 | RACE | WHITE |
1001 | SEX | M |
Show solution
%>%
dm select(USUBJID, RACE, SEX) %>%
pivot_longer(cols = c(RACE, SEX),
names_to = "VAR",
values_to = "VAL")
# A tibble: 612 × 3
USUBJID VAR VAL
<chr> <chr> <chr>
1 01-701-1015 RACE WHITE
2 01-701-1015 SEX F
3 01-701-1023 RACE WHITE
4 01-701-1023 SEX M
5 01-701-1028 RACE WHITE
6 01-701-1028 SEX M
7 01-701-1033 RACE WHITE
8 01-701-1033 SEX M
9 01-701-1034 RACE WHITE
10 01-701-1034 SEX F
# … with 602 more rows
10.3 Joining using dplyr
Load the ds
data-frame from admiral_ds
and suppds
data-frame from admiral_suppds
. Prior to joining the two datasets together, we may need to do some cleaning of the data on suppds
.
Filter
IDVAR
for"DSSEQ"
Mutate
IDVARVAL
from type character to type numeric.Select
USUBJID IDVARVAL QNAM QLABEL QVAL
Show solution
<- suppds %>%
suppds filter(IDVAR == "DSSEQ") %>%
mutate(IDVARVAL = as.numeric(IDVARVAL)) %>%
select(USUBJID, IDVARVAL, QNAM, QLABEL, QVAL)
suppds
# A tibble: 3 × 5
USUBJID IDVARVAL QNAM QLABEL QVAL
<chr> <dbl> <chr> <chr> <chr>
1 01-703-1175 2 ENTCRIT PROTOCOL ENTRY CRITERIA NOT MET 16
2 01-705-1382 2 ENTCRIT PROTOCOL ENTRY CRITERIA NOT MET 25
3 01-708-1372 3 ENTCRIT PROTOCOL ENTRY CRITERIA NOT MET 16
Join the two tables together using USUBJID
and DSSEQ
as the key joining variables.
Show solution
%>%
ds left_join(suppds, by = c("USUBJID" = "USUBJID", "DSSEQ" = "IDVARVAL"))
# A tibble: 850 × 16
STUDYID DOMAIN USUBJID DSSEQ DSSPID DSTERM DSDECOD DSCAT VISIT…¹ VISIT DSDTC
<chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 CDISCPI… DS 01-701… 1 <NA> RANDO… RANDOM… PROT… 3 BASE… 2014…
2 CDISCPI… DS 01-701… 2 <NA> PROTO… COMPLE… DISP… 13 WEEK… 2014…
3 CDISCPI… DS 01-701… 3 <NA> FINAL… FINAL … OTHE… 13 WEEK… 2014…
4 CDISCPI… DS 01-701… 1 <NA> RANDO… RANDOM… PROT… 3 BASE… 2012…
5 CDISCPI… DS 01-701… 2 24 ADVER… ADVERS… DISP… 5 WEEK… 2012…
6 CDISCPI… DS 01-701… 3 <NA> FINAL… FINAL … OTHE… 5 WEEK… 2012…
7 CDISCPI… DS 01-701… 4 <NA> FINAL… FINAL … OTHE… 201 RETR… 2013…
8 CDISCPI… DS 01-701… 1 <NA> RANDO… RANDOM… PROT… 3 BASE… 2013…
9 CDISCPI… DS 01-701… 2 <NA> PROTO… COMPLE… DISP… 13 WEEK… 2014…
10 CDISCPI… DS 01-701… 3 <NA> FINAL… FINAL … OTHE… 13 WEEK… 2014…
# … with 840 more rows, 5 more variables: DSSTDTC <chr>, DSSTDY <dbl>,
# QNAM <chr>, QLABEL <chr>, QVAL <chr>, and abbreviated variable name
# ¹VISITNUM