library("tidyverse")
# load data
<- read_csv("data/adsl.csv") adsl
4 select, filter & arrange exercises
4.1 Setup
4.2 Data wrangling with dplyr
Load the adsl
data-frame and select the following variables:
USUBJID
ARM
SEX
AGE
AGEU
AGEGR1
COUNTRY
EOSSTT
Show solution
# we use starts_with("AGE") because we want to include every AGE variable that is in the original data-frame, instead, we could also list all AGE variables separately.
%>%
adsl select(USUBJID, ARM, SEX, starts_with("AGE"), COUNTRY, EOSSTT)
# A tibble: 306 × 8
USUBJID ARM SEX AGE AGEU AGEGR1 COUNTRY EOSSTT
<chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
1 01-701-1015 Placebo F 63 YEARS 18-64 USA COMPLETED
2 01-701-1023 Placebo M 64 YEARS 18-64 USA DISCONTINU…
3 01-701-1028 Xanomeline High Dose M 71 YEARS >=65 USA COMPLETED
4 01-701-1033 Xanomeline Low Dose M 74 YEARS >=65 USA DISCONTINU…
5 01-701-1034 Xanomeline High Dose F 77 YEARS >=65 USA COMPLETED
6 01-701-1047 Placebo F 85 YEARS >=65 USA DISCONTINU…
7 01-701-1057 Screen Failure F 59 YEARS 18-64 USA <NA>
8 01-701-1097 Xanomeline Low Dose M 68 YEARS >=65 USA COMPLETED
9 01-701-1111 Xanomeline Low Dose F 81 YEARS >=65 USA DISCONTINU…
10 01-701-1115 Xanomeline Low Dose M 84 YEARS >=65 USA DISCONTINU…
# … with 296 more rows
On the selected variables, include only patients in the placebo arm who are 66, 77, 88, or 99 years old.
Show solution
# There are different ways to solve this. The best way to filter the AGE is to create a vector c(66, 77, 88, 99) and then use the %in% operator. The vector can be created in different ways, e.g.:
<- c(66, 77, 88, 99)
age_vec # or
<- 6:9 * 11
age_vec # we can then use either the age_vec or the code that created the age_vec directly as a statement in the filter function:
%>%
adsl select(USUBJID, SEX, ARM, EOSSTT, starts_with("AGE")) %>%
filter(ARM == "Placebo",
%in% c(66, 77, 88, 99)) AGE
# A tibble: 5 × 7
USUBJID SEX ARM EOSSTT AGE AGEU AGEGR1
<chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 01-705-1059 F Placebo DISCONTINUED 66 YEARS >=65
2 01-708-1171 F Placebo COMPLETED 77 YEARS >=65
3 01-710-1368 F Placebo COMPLETED 88 YEARS >=65
4 01-714-1035 F Placebo COMPLETED 88 YEARS >=65
5 01-718-1139 M Placebo COMPLETED 77 YEARS >=65
Further include the variable TRTSDTM (datetime of first exposure to treatment) and sort the previous data-frame according to this variable from most recent to least recent first exposure.
Show solution
%>%
adsl select(USUBJID, SEX, ARM, EOSSTT, starts_with("AGE"), TRTSDTM) %>%
filter(ARM == "Placebo",
%in% c(66, 77, 88, 99)) %>%
AGE arrange(desc(TRTSDTM))
# A tibble: 5 × 8
USUBJID SEX ARM EOSSTT AGE AGEU AGEGR1 TRTSDTM
<chr> <chr> <chr> <chr> <dbl> <chr> <chr> <dttm>
1 01-714-1035 F Placebo COMPLETED 88 YEARS >=65 2014-04-17 00:00:00
2 01-710-1368 F Placebo COMPLETED 88 YEARS >=65 2013-10-23 00:00:00
3 01-705-1059 F Placebo DISCONTINUED 66 YEARS >=65 2013-08-05 00:00:00
4 01-718-1139 M Placebo COMPLETED 77 YEARS >=65 2013-05-19 00:00:00
5 01-708-1171 F Placebo COMPLETED 77 YEARS >=65 2012-12-06 00:00:00