Introduction
The univar
function is called to produce univariate-type
summary statistics for numeric variables. A typical example of using the
univar
function is to create a tbl
chunk as
shown below for summarizing N, MEAN (SD), MEDIAN, RANGE, IQ Range for
the Age
variable in adsl.
tbl <- cdisc_adsl %>%
univar(colvar = "TRT01PN",
rowvar = "AGE",
statlist = statlist(c("N", "MEANSD", "MEDIAN", "RANGE", "IQRANGE")),
decimal = 0,
row_header = "Age (Years)")
knitr::kable(tbl)
label | 0 | 54 | 81 | row_type | group_level |
---|---|---|---|---|---|
Age (Years) | HEADER | 0 | |||
N | 5 | 5 | 5 | N | 0 |
Mean (SD) | 69.6 (14.40) | 75.6 (6.73) | 72.2 (9.23) | VALUE | 0 |
Median | 64.0 | 74.0 | 75.0 | VALUE | 0 |
Range | (52; 85) | (68; 84) | (57; 81) | VALUE | 0 |
IQ range | (63.0; 84.0) | (71.0; 81.0) | (71.0; 77.0) | VALUE | 0 |
Customizing Univariate Statistics
Besides the 5 standard univariate statistics shown above that are
often required in the demographic tables, you can pick any univariate
statistics from the table below and arrange them in a character vector
for passing to the statlist
argument.
Statlist | Description |
---|---|
N | number of non-missing values |
SUM | sum |
MEAN | mean |
GeoMEAN | geometric mean |
SD | standard deviation |
SE | standard error |
CV | coefficient of variation |
GSD | geometric standard deviation |
GSE | geometric standard error |
MEANSD | mean (standard deviation) |
MEANSE | mean (standard error) |
MEDIAN | median |
MIN | minimum |
MAX | maximum |
RANGE | range |
Q1 | 1st quartile |
Q3 | 3rd quartile |
IQRANGE | inter-quartile range |
MEDRANGE | median (range) |
MEDIQRANGE | median (inter-quartile range) |
MEAN_CI | mean (95% C.I.) |
GeoMEAN_CI | geometric mean (95% C.I.) |
A customized example is shown below for displaying N, Mean (95%
C.I.), and Geometric Mean (95% C.I.) for the Age
variable
in adsl.
tbl <- cdisc_adsl %>%
univar(colvar = "TRT01PN",
rowvar = "AGE",
statlist = statlist(c("N", "MEAN_CI", "GeoMEAN_CI")),
decimal = 0,
row_header = "Age (Years)")
knitr::kable(tbl)
label | 0 | 54 | 81 | row_type | group_level |
---|---|---|---|---|---|
Age (Years) | HEADER | 0 | |||
N | 5 | 5 | 5 | N | 0 |
Mean (95% C.I.) | 69.6 (51.72; 87.48) | 75.6 (67.24; 83.96) | 72.2 (60.74; 83.66) | VALUE | 0 |
Geometric Mean (95% C.I.) | 68.4 (52.72; 88.73) | 75.4 (67.51; 84.13) | 71.7 (60.50; 84.94) | VALUE | 0 |
Decimal Precision
The decimal precision to be used in display of univariate statistics
is comprised of two pieces. The base decimal precision is what controls
the base number of decimals to be used, this can be set using the
decimal
argument. The precision extra is what controls the
difference between the precision used for different statistics, this is
controlled using the option tidytlg.precision.extra
. The
precision extra is the amount precision will need to be adjusted from
the base precision for each different statistic. The default of the
precision extra is set by following our table and listing conventions:
Range has a precision extra of 0, Mean and Median have a precision extra
of 1, SD has a precision extra of 2. To see a full list of precision
extra defaults, please type
options("tidytlg.precision.extra")
in your console. An
example function call of univar
is shown below for
presenting the data using a base decimal value of 2.
tbl <- cdisc_adsl %>%
univar(colvar = "TRT01PN",
rowvar = "BMIBL",
decimal = 2,
row_header = "Age (Years)")
knitr::kable(tbl)
label | 0 | 54 | 81 | row_type | group_level |
---|---|---|---|---|---|
Age (Years) | HEADER | 0 | |||
N | 5 | 5 | 5 | N | 0 |
Mean (SD) | 27.080 (3.6424) | 27.180 (3.4419) | 27.760 (2.4795) | VALUE | 0 |
Median | 27.600 | 27.300 | 28.100 | VALUE | 0 |
Range | (21.90; 30.40) | (23.90; 32.00) | (24.90; 31.40) | VALUE | 0 |
IQ range | (25.100; 30.400) | (23.900; 28.800) | (26.100; 28.300) | VALUE | 0 |
Data Driven Precision
While static precision is useful in some cases, data driven precision
is also available. This is controlled using the
precisionby
, precisionon
, and
decimal
arguments. precisionby
tells the
function the variable(s) the user would like to compute the precision
using. This could be variables such as PARAMCD if the precision is to be
varied between parameter. precisionon
is the variable that
should be used when calculating how many base decimal places are present
in the data. The last piece to data drive precision is the
decimal
argument which gives us a cap for base precision
values. This can be used to help avoid unnecessarily long decimals in
your final output.
A customized example is shown below for presenting the univariate
summary of vital signs data using PARAMCD as the by variable. In
addition, we would like the precision to be data driven and varied by
parameter, which can be achieved by setting
precisionby = "PARAMCD"
.
tbl <- cdisc_advs %>%
univar(colvar = "TRTAN",
rowvar = "AVAL",
rowbyvar = "PARAMCD",
precisionby = "PARAMCD",
decimal = 4)
knitr::kable(tbl)
PARAMCD | label | 0 | 54 | 81 | row_type | group_level |
---|---|---|---|---|---|---|
DIABP | DIABP | BY_HEADER1 | 0 | |||
DIABP | N | 186 | 147 | 204 | N | 0 |
DIABP | Mean (SD) | 71.9 (9.75) | 71.6 (7.12) | 68.8 (10.34) | VALUE | 0 |
DIABP | Median | 71.5 | 72.0 | 69.0 | VALUE | 0 |
DIABP | Range | (50; 92) | (50; 87) | (43; 101) | VALUE | 0 |
DIABP | IQ range | (65.0; 78.0) | (68.0; 77.0) | (60.0; 76.0) | VALUE | 0 |
HEIGHT | HEIGHT | BY_HEADER1 | 0 | |||
HEIGHT | N | 5 | 5 | 5 | N | 0 |
HEIGHT | Mean (SD) | 161.696 (14.0567) | 172.364 (9.1494) | 163.576 (13.0260) | VALUE | 0 |
HEIGHT | Median | 162.560 | 175.260 | 162.560 | VALUE | 0 |
HEIGHT | Range | (147.32; 180.34) | (158.24; 181.61) | (147.32; 177.80) | VALUE | 0 |
HEIGHT | IQ range | (148.590; 169.670) | (168.910; 177.800) | (154.940; 175.260) | VALUE | 0 |
PULSE | PULSE | BY_HEADER1 | 0 | |||
PULSE | N | 186 | 147 | 204 | N | 0 |
PULSE | Mean (SD) | 69.4 (9.15) | 64.9 (10.18) | 70.5 (9.87) | VALUE | 0 |
PULSE | Median | 70.0 | 64.0 | 70.0 | VALUE | 0 |
PULSE | Range | (52; 94) | (50; 98) | (50; 97) | VALUE | 0 |
PULSE | IQ range | (61.0; 76.0) | (58.0; 70.0) | (62.0; 76.5) | VALUE | 0 |
SYSBP | SYSBP | BY_HEADER1 | 0 | |||
SYSBP | N | 186 | 147 | 204 | N | 0 |
SYSBP | Mean (SD) | 132.0 (12.02) | 127.5 (12.58) | 135.8 (23.65) | VALUE | 0 |
SYSBP | Median | 131.0 | 130.0 | 132.5 | VALUE | 0 |
SYSBP | Range | (100; 167) | (95; 151) | (95; 198) | VALUE | 0 |
SYSBP | IQ range | (123.0; 138.0) | (122.0; 137.0) | (116.0; 150.5) | VALUE | 0 |
TEMP | TEMP | BY_HEADER1 | 0 | |||
TEMP | N | 61 | 49 | 68 | N | 0 |
TEMP | Mean (SD) | 36.481 (0.3491) | 36.537 (0.4374) | 36.660 (0.2837) | VALUE | 0 |
TEMP | Median | 36.440 | 36.560 | 36.585 | VALUE | 0 |
TEMP | Range | (35.61; 37.67) | (34.28; 37.17) | (35.89; 37.33) | VALUE | 0 |
TEMP | IQ range | (36.220; 36.720) | (36.390; 36.720) | (36.470; 36.915) | VALUE | 0 |
WEIGHT | WEIGHT | BY_HEADER1 | 0 | |||
WEIGHT | N | 47 | 33 | 54 | N | 0 |
WEIGHT | Mean (SD) | 69.170 (10.1753) | 82.795 (11.2792) | 78.472 (15.5886) | VALUE | 0 |
WEIGHT | Median | 71.670 | 79.380 | 75.070 | VALUE | 0 |
WEIGHT | Range | (53.07; 80.74) | (59.88; 102.51) | (53.75; 99.79) | VALUE | 0 |
WEIGHT | IQ range | (54.430; 78.470) | (78.470; 88.450) | (63.960; 90.720) | VALUE | 0 |
While data driven precision is usually done with a by variable it
doesn’t always have to. The precisionon
argument can be
used to calculate data driven precision on a single variable. This might
be useful if a table template is going to be used multiple times or if
multiple parts of the table are using a similar call but need to have
different data driven precision. The following example uses the variable
CHG to calculate precision, similar to the above example we still use
decimal = 4
to cap our decimal spaces at 4.
tbl <- cdisc_advs %>%
filter(PARAMCD == "SYSBP") %>%
univar(colvar = "TRTAN",
rowvar = "CHG",
precisionon = "CHG",
decimal = 4)
knitr::kable(tbl)
label | 0 | 54 | 81 | row_type | group_level |
---|---|---|---|---|---|
N | 186 | 147 | 204 | N | 0 |
Mean (SD) | 0.3 (12.49) | 1.5 (9.87) | -5.5 (11.86) | VALUE | 0 |
Median | 0.0 | 1.0 | -7.0 | VALUE | 0 |
Range | (-44; 33) | (-31; 24) | (-32; 30) | VALUE | 0 |
IQ range | (-7.0; 8.0) | (-6.0; 7.0) | (-14.0; 0.0) | VALUE | 0 |
Another use case for the precisionon
argument could be
if you need to calculate the summary on one variable but use another for
precision for table output formatting. The following example uses both
precisionby
and precisionon
to show how they
can be used together to make special tables. For this table, we are
creating an element of the table that summarizes AVAL but uses CHG to
calculate precision. This allows us to have consistent formatting
throughout the table even though the two variables may have different
precision. We also calculate precision by PARAMCD since the output table
will be presented using that as a by variable.
tbl <- cdisc_advs %>%
filter(PARAMCD == "SYSBP") %>%
univar(colvar = "TRTAN",
rowvar = "AVAL",
rowbyvar = "PARAMCD",
precisionby = "PARAMCD",
precisionon = "CHG",
decimal = 4)
knitr::kable(tbl)
PARAMCD | label | 0 | 54 | 81 | row_type | group_level |
---|---|---|---|---|---|---|
SYSBP | SYSBP | BY_HEADER1 | 0 | |||
SYSBP | N | 186 | 147 | 204 | N | 0 |
SYSBP | Mean (SD) | 132.0 (12.02) | 127.5 (12.58) | 135.8 (23.65) | VALUE | 0 |
SYSBP | Median | 131.0 | 130.0 | 132.5 | VALUE | 0 |
SYSBP | Range | (100; 167) | (95; 151) | (95; 198) | VALUE | 0 |
SYSBP | IQ range | (123.0; 138.0) | (122.0; 137.0) | (116.0; 150.5) | VALUE | 0 |