# simple calculations
3*2
[1] 6
59 + 73 + 2) / 3 (
[1] 44.66667
# complex calculations
/8 pi
[1] 0.3926991
R can be used as a calculator following the order of operations using the basic arithmetic operators, although, the arithmetic equal sign (=
) is the equivalent of ==
.
# simple calculations
3*2
[1] 6
59 + 73 + 2) / 3 (
[1] 44.66667
# complex calculations
/8 pi
[1] 0.3926991
An object can be created to assign the value of your operation to a specific variable name, which can be reused later in the R session. Using the object_name <- value
naming convention, you can assign (<-
) the value ((59 + 73 + 2) / 3
) to an object_name
simple_cal to look like simple_cal <- (59 + 73 + 2) / 3
to store the evaluation of that calculation.
<- 1:10
x
<- 2*x
y
<- (59 + 73 + 2) / 3 simple_cal
Depending on the formats for the files containing your data, we can use different base R functions to read and load data into memory
R has two native data formats, Rdata (sometimes call Rda) and RDS.
Rdata can be selected R objects or a workspace, and RDS are single R object. R has base functions available to read the two native data formats, and some delimited files.
# saving rdata
save(x, file = "data/intro_1.RData")
# Save multiple objects
save(x, y, file = "data/intro_2.RData")
# Saving the entire workspake
save.image(file="data/intro_program.RData")
# We can follow the syntax for saving single Rdata object to save Rds files
# saveRDS(object, file = "my_data.rds")
# loading Rdata or Rda files
load(file = "data/intro_program.RData")
# loading RDS
# We can follow the syntax for read Rdata object to sread Rds files using the readRDS()
# Comma delimited
<- read.csv("data/adsl.csv", header = TRUE)
adsl_CSV
# Save CSV
<- write.csv(adsl_CSV, "data/save_data/adsl.csv", row.names=TRUE)
adsl_csv_save
<- write.table(
adsl_TAB_save
adsl_CSV,"data/save_data/adsl.txt",
append = FALSE,
sep = "\t",
dec = ".",
row.names = TRUE,
col.names = TRUE
)
# Tab-delimited
<- read.table("data/save_data/adsl.txt", header = TRUE, sep = "\t") adsl_TAB
R packages are a collection of reusable functions, compiled codes, documentation, sample data and tests. Some formats of data require the use of an R package in order to load that data into memory. Share-able R packages are typically stored in a repository such as the Comprehensive R Archive Network (CRAN), Bioconductor, and GitHub.
# From CRAN
#install.packages("insert_package_name")
# {haven} is used to import or export foreign statistical format files (SPSS, Stata, SAS)
install.packages("haven")
# {readxl}
install.packages("readxl")
# From Github
::install_github("pharmaverse/admiral", ref = "devel") remotes
Since an R packages are a collection of functions, you can choose to load the entire package within R memory or just the needed function from that package. Usually, the order you choose to load your package does not make a difference, unless you are loading two or more packages that has functions with the same name. If you are loading two or more packages with a common function name, then the package loaded last will hide that function in the earlier packages, so in that case it is important to note the order you choose to load the packages.
# read file using library call
library(haven)
<- read_sas("data/adsl.sas7bdat")
adsl_sas
# Reading Excel xls|xlsx files
# read_excel reads both xls|xlsx files but read_xls and read_xlsx can also be used to read respective files
# if NA are represented by another something other than blank then you can specified the NA value
# within the read_excel() function
R has different types of Datatype
* Integer * numeric * Character * Logical * complex * raw
But we will focus on the top 4.
set.seed(1234)
<- (1:5)
type_int <- rnorm(5)
type_num <- "USUBJID"
type_char <- TRUE
type_logl_1 <- FALSE
type_logl_2
class(type_int)
[1] "integer"
class(type_num)
[1] "numeric"
class(type_logl_1)
[1] "logical"
class(type_logl_2)
[1] "logical"
class(type_char)
[1] "character"
There are base R functions that can be used to format a date object similar to the Date9 formatted date variable from SAS. In addition, there are R packages available, such as {lubridate}, for more complex date/date time formatted objects.
# using adsl_sas RFSTDTC
class(adsl_sas$RFSTDTC)
[1] "character"
# Convert the date from that adsl_sas into a date variable
<- as.Date(adsl_sas$RFSTDTC, "%m/%d/%Y")
adsl_date class(adsl_date)
[1] "Date"
library(lubridate)
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
<- as_date(18757)
date9 $RFSTDTC <- ymd(adsl_sas$RFSTDTC)
adsl_sasclass(adsl_sas$RFSTDTC)
[1] "Date"
Data structures are dimensional ways of organizing the data. There are different data structures in R, let’s focus on vectors and dataframe
Vectors are 1 dimensional collection of data that can contain one or more element of the same data type
<- 2
vect_1 <- c(2, "USUBJID")
vect_2
class(vect_1)
[1] "numeric"
class(vect_2)
[1] "character"
# Saving vectors from a dataset to a specific variable
<- adsl_sas$USUBJID
usubjid <- adsl_sas[, 3] subjid
Dataframe is similar to SAS data sets and are 2 dimensional collection of vectors. Dataframe can store vectors of different types but must be of the same length
<- data.frame(
df age = c(65, 20, 37,19,45),
seq = (1:5),
type_logl = c(TRUE,FALSE, TRUE, TRUE, FALSE),
usubjid = c("001-940-9785","002-950-9726","003-940-9767","004-940-9795","005-940-9734")
)
# str() provides the data structure for each object in the dataframe
str(df)
'data.frame': 5 obs. of 4 variables:
$ age : num 65 20 37 19 45
$ seq : int 1 2 3 4 5
$ type_logl: logi TRUE FALSE TRUE TRUE FALSE
$ usubjid : chr "001-940-9785" "002-950-9726" "003-940-9767" "004-940-9795" ...
# In addition to the data structure per variable, also get some descriptive statistics
summary(df)
age seq type_logl usubjid
Min. :19.0 Min. :1 Mode :logical Length:5
1st Qu.:20.0 1st Qu.:2 FALSE:2 Class :character
Median :37.0 Median :3 TRUE :3 Mode :character
Mean :37.2 Mean :3
3rd Qu.:45.0 3rd Qu.:4
Max. :65.0 Max. :5