TRUE + 5
[1] 6
Now you know about variables, but what can you store in one? The short answer is everything! R has a wide variety of different data types that are used for different things. I will break down the most common ones you will use in this section. First, there are two distinct categories of data types: atomic data types and data structures.
Atomic data types are the most simple data types in R. These are the building blocks of data structures. Base R has the following atomic data types that you will commonly use:
Logical - TRUE or FALSE
Numeric - 1 or 1.574, or pi
character - “hello!”, “a”, “STRING”
While these data types are simple, they each have their own quirks that can cause confusion when they are first used.
Additionally, R does have other atomic data types that this guide will not cover here. They are either not widely used or are for have more advanced use-cases.
Logicals are a binary data type, either being TRUE or FALSE. They must be upper case or else R interprets it as a variable name. Additionally, R interprets TRUE as 1 and FALSE as 0, leading to interesting results when this data type is used with numerics. For example, see what happens when you add TRUE to 5:
TRUE + 5
[1] 6
This functionality can be useful in some cases, but just be aware of it in case you are getting results that do not make sense to you when dealing with logicals. Their primary use is in conditional functions, which we will explore later in this guide.
Numerics, also called doubles, are numbers. This can be either whole numbers, such as 9 or 3948, or they can be decimals, such as 8.47 or 937.5
typeof(584)
typeof(98.37483277777)
typeof(pi)
[1] "double"
[1] "double"
[1] "double"
Character vectors, also called strings, are a set of text characters. Anything located between a pair of double or single quotes will be considered as a character vector. This includes numbers, text, and symbols. If you want to have your character vector contain quotes, you can use a set of single quotes instead of double quotes and vice versa.
<- "Hi! Many different symbols can be in a character"
this_is_a_character_vector <- 'I have switched to "single quotes" to allow double quotes' quotes_in_string
Data structures are more complex data types that are made of the different atomic data types. The most commonly used data structures include:
Vectors
Lists
Matrices
Data frames
Factors
Lets go over what differentiates these data structures.
Vectors are the most common data structure. They are made of multiple objects of a single atomic data type. The most common way to create a vector is the concatenate function, c(), where each atomic object is separated by a comma:
<- c(1, 2, 3)
num_vector num_vector
[1] 1 2 3
Vectors are useful as a lot of R functions are vectorized, meaning one function is applied to every object in the given vector.
<- num_vector + 2
adding_two adding_two
[1] 3 4 5
The most important thing to remember about vectors is that they can only contain one type of atomic data objects. Lets see when we try to break that!
<- c(1, 2, "hi")
broken_vector broken_vector
[1] "1" "2" "hi"
As you can see, R has converted the numerics 1 and 2 into characters in order for the vector to be created. Be careful of these conversions, as it can cause unintended consequences.
So what if you want to have a vector that contains different data types? Well, then you use lists! Lists are just like vectors in that they hold multiple objects, just they are not limited to a single data type. A list can even contain vectors or other lists.
<- list(1, TRUE, c("vector", "in", "a", "list"))
cool_list cool_list
[[1]]
[1] 1
[[2]]
[1] TRUE
[[3]]
[1] "vector" "in" "a" "list"
Matrices (the plural for matrix) are two-dimensional vectors. They are made of rows and columns, but every data object stored in them must be of the same type. This means if you store numerics in a matrix, that matrix will only store numerics.
<- matrix(1:20, nrow = 4, ncol = 5, byrow = FALSE)
my_matrix my_matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
#What happens when we try to change one of the data objects in the matrix to a character?
1,1] <- "hi"
my_matrix[#It converts all of the data objects into characters! my_matrix
[,1] [,2] [,3] [,4] [,5]
[1,] "hi" "5" "9" "13" "17"
[2,] "2" "6" "10" "14" "18"
[3,] "3" "7" "11" "15" "19"
[4,] "4" "8" "12" "16" "20"
While matrices are computationally fast due to their simplicity, their limitations can be too restrictive. Additionally, the conversion of object types can cause confusion in downstream applications. This is why we mainly use data frames!
Data frames are to lists what matrices are to vectors. Rather than being limited to the whole object only containing one data type, each column can contain a different data type. This means if you have a spreadsheet of names and heights, you can have one column contain characters (names), and the other column contain numbers (height in inches) without having R convert either column into a different type.
<- data.frame(Name = c("Crosby", "Stills", "Nash"), Height = c(70, 68, 72), Nationality = c("American", "American", "British"))
my_dataframe my_dataframe
Name Height Nationality
1 Crosby 70 American
2 Stills 68 American
3 Nash 72 British
Data frames will be used greatly throughout this guide due to the versatility and ease of use. Most data sets you will create and explore in biology will work best in this data type, so make sure you understand what differentiates a data frame from the other data structures discussed above!
There are a set of operators related to working with data frames that allow one to extract specific sets of data from the object.
Name | Symbol | Type | Usage |
---|---|---|---|
Named element extrator | $ | Slice | Extracts and returns the element of a given name, such as a named column |
Slice list extractor | [] | Slice | Extracts element(s) at a given index location and returns a list of values |
Slice element extractor | [[]] | Slice | Extracts and returns element(s) at a given index location |
$Name
my_dataframe
#The first value in the slice list extractor is the row index number
#The second value is the column index number, and they are separated by a comma
1,1]
my_dataframe[
#You can also keep oen of the arguments empty to return all of that type
1,] #Returns all columns of the first row
my_dataframe[
#If you want to return the element and not a list containing the element, use [[]]
1,1]] my_dataframe[[
[1] "Crosby" "Stills" "Nash"
[1] "Crosby"
Name Height Nationality
1 Crosby 70 American
[1] "Crosby"
There are three ways to represent missing data in R. For data that does not exist, R has the object NULL
. Setting a variable equal to NULL
is a way to delete the object that variable is assigned to. If a value is unknown but does exist, then R uses the object NA
. Then, in the case of impossible values, such as dividing by zero, R has the object NaN
, or not a number.