3 select, filter & arrange

Author

Stefan Thoma

The tidyverse is a collection of R packages designed for data science. It includes packages such as ggplot2 for data visualization, dplyr for data manipulation, and tidyr for reshaping data. The tidyverse is built around the idea of “tidy data,” which is a standardized way of organizing and structuring data for analysis. The packages in the tidyverse are designed to work together seamlessly, making it a popular choice for data scientists and analysts who use R.

3.1 Setup

library(tidyverse)

Read data

adsl <- read_csv("data/adsl.csv")

Rows: 306 Columns: 50
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (23): STUDYID, USUBJID, DTHFL, AGEU, SEX, RACE, ETHNIC, ARMCD, ARM, ACT...
dbl   (7): SUBJID, SITEID, AGE, DMDY, TRTDURD, DTHADY, LDDTHELD
lgl   (3): RFICDTC, REGION1, DTHA30FL
dttm  (3): RFPENDTC, TRTSDTM, TRTEDTM
date (14): RFSTDTC, RFENDTC, RFXSTDTC, RFXENDTC, DTHDTC, DMDTC, TRTSDT, TRTE...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We can have a look at the data using many different commands / functions, e.g. the head() function which gives us the first six observations:

head(adsl)

# A tibble: 6 × 50
  STUDYID     USUBJID SUBJID RFSTDTC    RFENDTC    RFXSTDTC   RFXENDTC   RFICDTC
  <chr>       <chr>    <dbl> <date>     <date>     <date>     <date>     <lgl>  
1 CDISCPILOT… 01-701…   1015 2014-01-02 2014-07-02 2014-01-02 2014-07-02 NA     
2 CDISCPILOT… 01-701…   1023 2012-08-05 2012-09-02 2012-08-05 2012-09-01 NA     
3 CDISCPILOT… 01-701…   1028 2013-07-19 2014-01-14 2013-07-19 2014-01-14 NA     
4 CDISCPILOT… 01-701…   1033 2014-03-18 2014-04-14 2014-03-18 2014-03-31 NA     
5 CDISCPILOT… 01-701…   1034 2014-07-01 2014-12-30 2014-07-01 2014-12-30 NA     
6 CDISCPILOT… 01-701…   1047 2013-02-12 2013-03-29 2013-02-12 2013-03-09 NA     
# … with 42 more variables: RFPENDTC <dttm>, DTHDTC <date>, DTHFL <chr>,
#   SITEID <dbl>, AGE <dbl>, AGEU <chr>, SEX <chr>, RACE <chr>, ETHNIC <chr>,
#   ARMCD <chr>, ARM <chr>, ACTARMCD <chr>, ACTARM <chr>, COUNTRY <chr>,
#   DMDTC <date>, DMDY <dbl>, TRT01P <chr>, TRT01A <chr>, TRTSDTM <dttm>,
#   TRTSTMF <chr>, TRTEDTM <dttm>, TRTETMF <chr>, TRTSDT <date>, TRTEDT <date>,
#   TRTDURD <dbl>, SCRFDT <date>, EOSDT <date>, EOSSTT <chr>, FRVDT <date>,
#   RANDDT <date>, DTHDT <date>, DTHADY <dbl>, LDDTHELD <dbl>, …

3.2 `dplyr`

dplyr is a package in the tidyverse that provides a set of functions for efficiently manipulating and cleaning data. It is built around the idea of “verbs” that correspond to common data manipulation tasks, such as select() for selecting specific columns from a data frame, filter() for filtering rows based on certain conditions, arrange() for sorting data-frames and group_by() and summarize() for grouping and summarizing data by one or more variables.

dplyr is not strictly needed for any of that, everything can be done in base R. However, dplyr provides a framework to write readable code and a pipeline to work efficiently.

There are various functions within dplyr for datawrangling which follow a consistent structure. The first input of the most used dplyr functions is the data-frame. Then follow arguments specifying the behaviour of the function. Compared to the base r syntax we do not have to write column / variable names in quotation marks; dplyr syntax lets us refer to columns within a data-frame without the need to always reference the data-frame of origin.

3.2.1 select

The select function lets us select all variables mentioned in the arguments (and drops all other variables). Alternatively, we can selectively drop variables if we place a minus (-) in front of the variable name.
We can first have a look at all variable names of the data-frame:

names(adsl)

 [1] "STUDYID"  "USUBJID"  "SUBJID"   "RFSTDTC"  "RFENDTC"  "RFXSTDTC"
 [7] "RFXENDTC" "RFICDTC"  "RFPENDTC" "DTHDTC"   "DTHFL"    "SITEID"  
[13] "AGE"      "AGEU"     "SEX"      "RACE"     "ETHNIC"   "ARMCD"   
[19] "ARM"      "ACTARMCD" "ACTARM"   "COUNTRY"  "DMDTC"    "DMDY"    
[25] "TRT01P"   "TRT01A"   "TRTSDTM"  "TRTSTMF"  "TRTEDTM"  "TRTETMF" 
[31] "TRTSDT"   "TRTEDT"   "TRTDURD"  "SCRFDT"   "EOSDT"    "EOSSTT"  
[37] "FRVDT"    "RANDDT"   "DTHDT"    "DTHADY"   "LDDTHELD" "LSTALVDT"
[43] "AGEGR1"   "SAFFL"    "RACEGR1"  "REGION1"  "LDDTHGR1" "DTH30FL" 
[49] "DTHA30FL" "DTHB30FL"

And then select the desired variables:

# dplyr::select
select(adsl, 
       STUDYID, 
       USUBJID, 
       ARM, 
       AGE, 
       SEX, 
       RACE)

# A tibble: 306 × 6
   STUDYID      USUBJID     ARM                    AGE SEX   RACE 
   <chr>        <chr>       <chr>                <dbl> <chr> <chr>
 1 CDISCPILOT01 01-701-1015 Placebo                 63 F     WHITE
 2 CDISCPILOT01 01-701-1023 Placebo                 64 M     WHITE
 3 CDISCPILOT01 01-701-1028 Xanomeline High Dose    71 M     WHITE
 4 CDISCPILOT01 01-701-1033 Xanomeline Low Dose     74 M     WHITE
 5 CDISCPILOT01 01-701-1034 Xanomeline High Dose    77 F     WHITE
 6 CDISCPILOT01 01-701-1047 Placebo                 85 F     WHITE
 7 CDISCPILOT01 01-701-1057 Screen Failure          59 F     WHITE
 8 CDISCPILOT01 01-701-1097 Xanomeline Low Dose     68 M     WHITE
 9 CDISCPILOT01 01-701-1111 Xanomeline Low Dose     81 F     WHITE
10 CDISCPILOT01 01-701-1115 Xanomeline Low Dose     84 M     WHITE
# … with 296 more rows

We end up with a new data-frame including only the selected variables. Note here that we do not save the resulting data-frame at the moment.

There are also some helper functions to use within the select function of dplyr. starts_with() ends_with() num_range(). They allow us to select multiple columns sharing a naming structure. num_range() let’s us select consecutively numbered columns, e.g.: num_range("example", 1:4) would select the columns named: example1, example2, example3, example4.

We can try out starts_with():

select(adsl,
       USUBJID, 
       starts_with("trt"))

# A tibble: 306 × 10
   USUBJID TRT01P TRT01A TRTSDTM             TRTSTMF TRTEDTM             TRTETMF
   <chr>   <chr>  <chr>  <dttm>              <chr>   <dttm>              <chr>  
 1 01-701… Place… Place… 2014-01-02 00:00:00 H       2014-07-02 23:59:59 H      
 2 01-701… Place… Place… 2012-08-05 00:00:00 H       2012-09-01 23:59:59 H      
 3 01-701… Xanom… Xanom… 2013-07-19 00:00:00 H       2014-01-14 23:59:59 H      
 4 01-701… Xanom… Xanom… 2014-03-18 00:00:00 H       2014-03-31 23:59:59 H      
 5 01-701… Xanom… Xanom… 2014-07-01 00:00:00 H       2014-12-30 23:59:59 H      
 6 01-701… Place… Place… 2013-02-12 00:00:00 H       2013-03-09 23:59:59 H      
 7 01-701… Scree… Scree… NA                  <NA>    NA                  <NA>   
 8 01-701… Xanom… Xanom… 2014-01-01 00:00:00 H       2014-07-09 23:59:59 H      
 9 01-701… Xanom… Xanom… 2012-09-07 00:00:00 H       2012-09-16 23:59:59 H      
10 01-701… Xanom… Xanom… 2012-11-30 00:00:00 H       2013-01-23 23:59:59 H      
# … with 296 more rows, and 3 more variables: TRTSDT <date>, TRTEDT <date>,
#   TRTDURD <dbl>

And ends_with():

# in this df, all variables that contain dates end with "DT". 
# We can select them:
select(adsl,
       USUBJID, 
       ends_with("DT"))

# A tibble: 306 × 9
   USUBJID     TRTSDT     TRTEDT     SCRFDT     EOSDT      FRVDT      RANDDT    
   <chr>       <date>     <date>     <date>     <date>     <date>     <date>    
 1 01-701-1015 2014-01-02 2014-07-02 NA         2014-07-02 NA         2014-01-02
 2 01-701-1023 2012-08-05 2012-09-01 NA         2012-09-02 2013-02-18 2012-08-05
 3 01-701-1028 2013-07-19 2014-01-14 NA         2014-01-14 NA         2013-07-19
 4 01-701-1033 2014-03-18 2014-03-31 NA         2014-04-14 2014-09-15 2014-03-18
 5 01-701-1034 2014-07-01 2014-12-30 NA         2014-12-30 NA         2014-07-01
 6 01-701-1047 2013-02-12 2013-03-09 NA         2013-03-29 2013-07-28 2013-02-12
 7 01-701-1057 NA         NA         2013-12-20 NA         NA         NA        
 8 01-701-1097 2014-01-01 2014-07-09 NA         2014-07-09 NA         2014-01-01
 9 01-701-1111 2012-09-07 2012-09-16 NA         2012-09-17 2013-02-22 2012-09-07
10 01-701-1115 2012-11-30 2013-01-23 NA         2013-01-23 2013-05-20 2012-11-30
# … with 296 more rows, and 2 more variables: DTHDT <date>, LSTALVDT <date>

If we want a data-frame that does not include any dates, we can make use of the minus sign in combination with the ends_with() function:

select(adsl,
       -ends_with("DT"))

# A tibble: 306 × 42
   STUDYID    USUBJID SUBJID RFSTDTC    RFENDTC    RFXSTDTC   RFXENDTC   RFICDTC
   <chr>      <chr>    <dbl> <date>     <date>     <date>     <date>     <lgl>  
 1 CDISCPILO… 01-701…   1015 2014-01-02 2014-07-02 2014-01-02 2014-07-02 NA     
 2 CDISCPILO… 01-701…   1023 2012-08-05 2012-09-02 2012-08-05 2012-09-01 NA     
 3 CDISCPILO… 01-701…   1028 2013-07-19 2014-01-14 2013-07-19 2014-01-14 NA     
 4 CDISCPILO… 01-701…   1033 2014-03-18 2014-04-14 2014-03-18 2014-03-31 NA     
 5 CDISCPILO… 01-701…   1034 2014-07-01 2014-12-30 2014-07-01 2014-12-30 NA     
 6 CDISCPILO… 01-701…   1047 2013-02-12 2013-03-29 2013-02-12 2013-03-09 NA     
 7 CDISCPILO… 01-701…   1057 NA         NA         NA         NA         NA     
 8 CDISCPILO… 01-701…   1097 2014-01-01 2014-07-09 2014-01-01 2014-07-09 NA     
 9 CDISCPILO… 01-701…   1111 2012-09-07 2012-09-17 2012-09-07 2012-09-16 NA     
10 CDISCPILO… 01-701…   1115 2012-11-30 2013-01-23 2012-11-30 2013-01-23 NA     
# … with 296 more rows, and 34 more variables: RFPENDTC <dttm>, DTHDTC <date>,
#   DTHFL <chr>, SITEID <dbl>, AGE <dbl>, AGEU <chr>, SEX <chr>, RACE <chr>,
#   ETHNIC <chr>, ARMCD <chr>, ARM <chr>, ACTARMCD <chr>, ACTARM <chr>,
#   COUNTRY <chr>, DMDTC <date>, DMDY <dbl>, TRT01P <chr>, TRT01A <chr>,
#   TRTSDTM <dttm>, TRTSTMF <chr>, TRTEDTM <dttm>, TRTETMF <chr>,
#   TRTDURD <dbl>, EOSSTT <chr>, DTHADY <dbl>, LDDTHELD <dbl>, AGEGR1 <chr>,
#   SAFFL <chr>, RACEGR1 <chr>, REGION1 <lgl>, LDDTHGR1 <chr>, DTH30FL <chr>, …

Change order of variables

We can use the select() function to reorder the variables in the data-frame. This does not affect the order of rows.

select(adsl,
       ARM, 
       USUBJID)

# A tibble: 306 × 2
   ARM                  USUBJID    
   <chr>                <chr>      
 1 Placebo              01-701-1015
 2 Placebo              01-701-1023
 3 Xanomeline High Dose 01-701-1028
 4 Xanomeline Low Dose  01-701-1033
 5 Xanomeline High Dose 01-701-1034
 6 Placebo              01-701-1047
 7 Screen Failure       01-701-1057
 8 Xanomeline Low Dose  01-701-1097
 9 Xanomeline Low Dose  01-701-1111
10 Xanomeline Low Dose  01-701-1115
# … with 296 more rows

3.2.2 filter

The filter function allows us to look at a subset of observations. As input, the function requires a logical vector and (of course) a data-frame. This time, we first save the reduced (selected) data-frame and use that as the first argument to filter.

selected_data <- select(adsl, 
                        STUDYID, 
                        USUBJID, 
                        ARM, 
                        AGE, 
                        SEX, 
                        RACE)

The logical vector is generally created within the function call and can use any of the following logic operators:

<                less than
<=               less than or equal to
>                greater than
>=               greater than or equal to
==               equal
!=               not equal
!x               not x (negation)
x | y            x OR y
x & y            x AND y
x %in% y         logical vector of length x with TRUE if element of x is in y

Within filter, we can chain logical vectors by separating them with a comma (,). Lets have a look at women that are 70 and older:

filter(selected_data,
       AGE >= 70,
       SEX == "F")

# A tibble: 141 × 6
   STUDYID      USUBJID     ARM                    AGE SEX   RACE               
   <chr>        <chr>       <chr>                <dbl> <chr> <chr>              
 1 CDISCPILOT01 01-701-1034 Xanomeline High Dose    77 F     WHITE              
 2 CDISCPILOT01 01-701-1047 Placebo                 85 F     WHITE              
 3 CDISCPILOT01 01-701-1111 Xanomeline Low Dose     81 F     WHITE              
 4 CDISCPILOT01 01-701-1133 Xanomeline High Dose    81 F     WHITE              
 5 CDISCPILOT01 01-701-1146 Xanomeline High Dose    75 F     WHITE              
 6 CDISCPILOT01 01-701-1153 Placebo                 79 F     WHITE              
 7 CDISCPILOT01 01-701-1162 Screen Failure          82 F     WHITE              
 8 CDISCPILOT01 01-701-1181 Xanomeline High Dose    79 F     WHITE              
 9 CDISCPILOT01 01-701-1192 Xanomeline Low Dose     80 F     WHITE              
10 CDISCPILOT01 01-701-1203 Placebo                 81 F     BLACK OR AFRICAN A…
# … with 131 more rows

Now we have a reduced data frame with female patients over 70. However, the nested call is not very intuitive to read. If any more functions get added to this code, it becomes even less readable. That is where the pipe operator (%>%) comes in.

pipe

The pipe operator let us chain multiple dplyr commands, so we can always forward the previously filtered / selected / arranged dataframe and keep working with it. The pipe operator let’s us write nested function calls in a sequential way. Traditionally, we start a new line after every pipe operator.

# select, filter, & pipe:
adsl %>% # This pipe forwards adsl to the select function as its first argument
  select(STUDYID, 
         USUBJID,
         ARM, 
         AGE, 
         SEX, 
         RACE) %>% # this pipe forwards the selected variables to the filter function
  filter(AGE >= 70,
         SEX == "F")

# A tibble: 141 × 6
   STUDYID      USUBJID     ARM                    AGE SEX   RACE               
   <chr>        <chr>       <chr>                <dbl> <chr> <chr>              
 1 CDISCPILOT01 01-701-1034 Xanomeline High Dose    77 F     WHITE              
 2 CDISCPILOT01 01-701-1047 Placebo                 85 F     WHITE              
 3 CDISCPILOT01 01-701-1111 Xanomeline Low Dose     81 F     WHITE              
 4 CDISCPILOT01 01-701-1133 Xanomeline High Dose    81 F     WHITE              
 5 CDISCPILOT01 01-701-1146 Xanomeline High Dose    75 F     WHITE              
 6 CDISCPILOT01 01-701-1153 Placebo                 79 F     WHITE              
 7 CDISCPILOT01 01-701-1162 Screen Failure          82 F     WHITE              
 8 CDISCPILOT01 01-701-1181 Xanomeline High Dose    79 F     WHITE              
 9 CDISCPILOT01 01-701-1192 Xanomeline Low Dose     80 F     WHITE              
10 CDISCPILOT01 01-701-1203 Placebo                 81 F     BLACK OR AFRICAN A…
# … with 131 more rows

There is another inline operator which can be very useful within the filter function; %in%. With this operator, we can select rows based on a prespecified vector of values. This can be useful if there are specified values (e.g., specific USUBJID) which we would like to look at.

# we save 4 USUBJID's in a vector:
lookup_ids <- c("01-716-1151", "01-710-1443", "01-708-1184", "01-705-1186")


# and then create a logical vector which returns TRUE for every entry in the 
# USUBJID vector which are represented in the lookup_ids, and else FALSE
adsl$USUBJID %in% lookup_ids

  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[217] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[265] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[301] FALSE FALSE FALSE FALSE FALSE FALSE

# this approach can be used in the filter function: 
adsl %>% 
  select(STUDYID, USUBJID, ARM, AGE, SEX, RACE) %>% 
  filter(USUBJID %in% lookup_ids)

# A tibble: 4 × 6
  STUDYID      USUBJID     ARM                   AGE SEX   RACE 
  <chr>        <chr>       <chr>               <dbl> <chr> <chr>
1 CDISCPILOT01 01-705-1186 Placebo                84 F     WHITE
2 CDISCPILOT01 01-708-1184 Screen Failure         70 F     WHITE
3 CDISCPILOT01 01-710-1443 Screen Failure         88 F     WHITE
4 CDISCPILOT01 01-716-1151 Xanomeline Low Dose    83 F     WHITE

Note that within the filter function (and in all major dplyr functions) R looks for the requested variables first within the supplied data-frame and afterwards in the global environment.

3.2.3 arrange

We can sort the dataframe with the arrange() function. It allows the sorting based on multiple variables. Note that the order of arranging variables determines the sorting hierarchy, so in this example we first order by AGE and

 adsl %>% 
  select(STUDYID, USUBJID, ARM, AGE, SEX, RACE) %>% 
  filter(AGE >= 70,
         SEX == "F") %>% 
  arrange(ARM, AGE)

# A tibble: 141 × 6
   STUDYID      USUBJID     ARM       AGE SEX   RACE                     
   <chr>        <chr>       <chr>   <dbl> <chr> <chr>                    
 1 CDISCPILOT01 01-705-1282 Placebo    70 F     BLACK OR AFRICAN AMERICAN
 2 CDISCPILOT01 01-704-1260 Placebo    71 F     WHITE                    
 3 CDISCPILOT01 01-703-1210 Placebo    72 F     WHITE                    
 4 CDISCPILOT01 01-716-1026 Placebo    73 F     WHITE                    
 5 CDISCPILOT01 01-718-1150 Placebo    73 F     WHITE                    
 6 CDISCPILOT01 01-708-1087 Placebo    74 F     WHITE                    
 7 CDISCPILOT01 01-708-1316 Placebo    74 F     WHITE                    
 8 CDISCPILOT01 01-709-1001 Placebo    76 F     WHITE                    
 9 CDISCPILOT01 01-710-1077 Placebo    76 F     WHITE                    
10 CDISCPILOT01 01-715-1397 Placebo    76 F     WHITE                    
# … with 131 more rows

To sort by descending order, we can use the helper function desc() within arrange():

 adsl %>% 
  select(STUDYID, USUBJID, ARM, AGE, SEX, RACE) %>% 
  filter(AGE >= 70,
         SEX == "F") %>% 
  arrange(ARM, desc(AGE))

# A tibble: 141 × 6
   STUDYID      USUBJID     ARM       AGE SEX   RACE 
   <chr>        <chr>       <chr>   <dbl> <chr> <chr>
 1 CDISCPILOT01 01-710-1083 Placebo    89 F     WHITE
 2 CDISCPILOT01 01-710-1368 Placebo    88 F     WHITE
 3 CDISCPILOT01 01-714-1035 Placebo    88 F     WHITE
 4 CDISCPILOT01 01-701-1387 Placebo    87 F     WHITE
 5 CDISCPILOT01 01-704-1233 Placebo    87 F     WHITE
 6 CDISCPILOT01 01-716-1024 Placebo    87 F     WHITE
 7 CDISCPILOT01 01-705-1349 Placebo    86 F     WHITE
 8 CDISCPILOT01 01-710-1271 Placebo    86 F     WHITE
 9 CDISCPILOT01 01-716-1108 Placebo    86 F     WHITE
10 CDISCPILOT01 01-701-1047 Placebo    85 F     WHITE
# … with 131 more rows