An R Guide for Bowdoin Biology Students

Welcome! to the guide for using R as a biology student at Bowdoin College! This guide will act as a comprehensive introduction to the R statistical programming language and its applications in the field of biology. Included in this guide are: an introduction to using R, an overview of R structure, data manipulation, data visualization/graphing, statistical analyses, biology-specific packages, and intermediate/advanced R use cases. This guide can be used by complete R beginners, people with experience with other programming languages, and even people with R experience outside the field of biology.

What is R?

R is a statistical programming language developed by Ross Ihaka and Robert Gentleman.1 Based on the principles of free (as in freedom) open source software (FOSS), the R programming language is freely available at the Comprehensive R Archive Newtork (CRAN) and can run on Windows, Linus, and MacOS platforms.

So why use R instead of other statistical packages, such as SPSS, SAS, Stata, Prism, or others?

  • It is a standard tool in most academic disciplines that use statistical analyses for research and industry.2 In fact, over 54% of all published literature articles in the top 30 journals in ecology used R.3

  • R has an active community supporting it. Tens of thousands of packages, the main way of extended R’s capabilities, have been created and are actively maintained.4

  • Due to its open source nature, the source code can be freely investigated, ensuring confidence in its accuracy.

  • It is free!

What can I do with R?

R is a programming language, so almost anything you can think of! However, you most likely would not want to make a video game using R. R is specialized to statistical analysis and data science. We will be using it to process our data, run statistical tests, and to graph our results. However, this is just the tip of the iceberg of what you can do with R! From analyzing genomic data sets, to creating machine learning models, to GIS analysis, R can do it all and even more.

How do I use this guide?

This guide is broken into different sections that each cover information about R. In these sections, there will be text explanations such as this, and also code blocks with relevant R code. By pressing the “Run code” button, the program will run and you will be able to see the results below the block. For example, try this simple code block:


If it was successful, then you should see a result that says 2. Additionally, there will be some code blocks without the “Run code” button, which are just to explain specific concepts. See below:

var <- 1
two.var <- var + var

For some code blocks, there will be code annotations on the right side that appear when you hover your mouse cursor over the icon.

var <- 1
1
Here is a code annotation!

References

1. Giorgi FM, Ceraolo C, Mercatelli D. The r language: An engine for bioinformatics and data science. Life. 2022;12(5):648. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9148156/. doi:10.3390/life12050648
2. Muenchen R. The popularity of data science software | r4stats.com. 2012. https://r4stats.com/articles/popularity/
3. Lai J, Lortie CJ, Muenchen RA, Yang J, Ma K. Evaluating the popularity of R in ecology. Ecosphere. 2019;10(1):e02567. https://onlinelibrary.wiley.com/doi/abs/10.1002/ecs2.2567. doi:10.1002/ecs2.2567
4. The comprehensive r archive network. https://cran.r-project.org/