4  select, filter & arrange exercises

Author

Stefan Thoma

4.1 Setup

library("tidyverse")


# load data
adsl <- read_csv("data/adsl.csv")

4.2 Data wrangling with dplyr

Load the adsl data-frame and select the following variables:

  • USUBJID

  • ARM

  • SEX

  • AGE

  • AGEU

  • AGEGR1

  • COUNTRY

  • EOSSTT

Show solution
# we use starts_with("AGE") because we want to include every AGE variable that is in the original data-frame, instead, we could also list all AGE variables separately.
adsl %>% 
  select(USUBJID, ARM, SEX, starts_with("AGE"), COUNTRY, EOSSTT)
# A tibble: 306 × 8
   USUBJID     ARM                  SEX     AGE AGEU  AGEGR1 COUNTRY EOSSTT     
   <chr>       <chr>                <chr> <dbl> <chr> <chr>  <chr>   <chr>      
 1 01-701-1015 Placebo              F        63 YEARS 18-64  USA     COMPLETED  
 2 01-701-1023 Placebo              M        64 YEARS 18-64  USA     DISCONTINU…
 3 01-701-1028 Xanomeline High Dose M        71 YEARS >=65   USA     COMPLETED  
 4 01-701-1033 Xanomeline Low Dose  M        74 YEARS >=65   USA     DISCONTINU…
 5 01-701-1034 Xanomeline High Dose F        77 YEARS >=65   USA     COMPLETED  
 6 01-701-1047 Placebo              F        85 YEARS >=65   USA     DISCONTINU…
 7 01-701-1057 Screen Failure       F        59 YEARS 18-64  USA     <NA>       
 8 01-701-1097 Xanomeline Low Dose  M        68 YEARS >=65   USA     COMPLETED  
 9 01-701-1111 Xanomeline Low Dose  F        81 YEARS >=65   USA     DISCONTINU…
10 01-701-1115 Xanomeline Low Dose  M        84 YEARS >=65   USA     DISCONTINU…
# … with 296 more rows

On the selected variables, include only patients in the placebo arm who are 66, 77, 88, or 99 years old.

Show solution
# There are different ways to solve this. The best way to filter the AGE is to create a vector c(66, 77, 88, 99) and then use the %in% operator. The vector can be created in different ways, e.g.: 
age_vec <- c(66, 77, 88, 99)
# or 
age_vec  <- 6:9 * 11
# we can then use either the age_vec or the code that created the age_vec directly as a statement in the filter function:


adsl %>% 
  select(USUBJID, SEX, ARM, EOSSTT, starts_with("AGE")) %>% 
  filter(ARM == "Placebo",
         AGE %in% c(66, 77, 88, 99))
# A tibble: 5 × 7
  USUBJID     SEX   ARM     EOSSTT         AGE AGEU  AGEGR1
  <chr>       <chr> <chr>   <chr>        <dbl> <chr> <chr> 
1 01-705-1059 F     Placebo DISCONTINUED    66 YEARS >=65  
2 01-708-1171 F     Placebo COMPLETED       77 YEARS >=65  
3 01-710-1368 F     Placebo COMPLETED       88 YEARS >=65  
4 01-714-1035 F     Placebo COMPLETED       88 YEARS >=65  
5 01-718-1139 M     Placebo COMPLETED       77 YEARS >=65  

Further include the variable TRTSDTM (datetime of first exposure to treatment) and sort the previous data-frame according to this variable from most recent to least recent first exposure.

Show solution
adsl %>% 
  select(USUBJID, SEX, ARM, EOSSTT, starts_with("AGE"), TRTSDTM) %>% 
  filter(ARM == "Placebo",
         AGE %in% c(66, 77, 88, 99)) %>% 
  arrange(desc(TRTSDTM))
# A tibble: 5 × 8
  USUBJID     SEX   ARM     EOSSTT         AGE AGEU  AGEGR1 TRTSDTM            
  <chr>       <chr> <chr>   <chr>        <dbl> <chr> <chr>  <dttm>             
1 01-714-1035 F     Placebo COMPLETED       88 YEARS >=65   2014-04-17 00:00:00
2 01-710-1368 F     Placebo COMPLETED       88 YEARS >=65   2013-10-23 00:00:00
3 01-705-1059 F     Placebo DISCONTINUED    66 YEARS >=65   2013-08-05 00:00:00
4 01-718-1139 M     Placebo COMPLETED       77 YEARS >=65   2013-05-19 00:00:00
5 01-708-1171 F     Placebo COMPLETED       77 YEARS >=65   2012-12-06 00:00:00