Data Analysis and Visualization in R for Ecologists

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecology data in R.
This is an introduction to R designed for participants with no programming experience. This lesson adapts a 6-hour curriculum down to between 2 and 3 hours. The episodes start withsome basic information about R syntax, the RStudio interface, and move through how to import CSV files, wrangle data, and calculate summary statistics, before ending with a brief introduction to plotting. Because it’s shortened, you may want to look at the original lesson for more detail.
This lesson assumes no prior knowledge of R or RStudio and no programming experience.
Contributors
The list of contributors to the original lesson is available in the citation page. It has been adapted by Nathaniel D. Porter and Jesse Sadler (Virginia Tech University Libraries).
Preparations
Data Carpentry’s teaching is hands-on, and to follow this lesson learners must have R and RStudio installed on their computers. They also need to be able to install a number of R packages, create directories, and download files.
To avoid troubleshooting during the lesson, learners should follow the instructions below to download and install everything beforehand. If the computer is managed by their organization’s IT department they might need help from an IT administrator.
Install R and RStudio
R and RStudio are two separate pieces of software:
- R is a programming language and software used to run code written in R.
- RStudio is an integrated development environment (IDE) that makes using R easier. In this course we use RStudio to interact with R.
If you don’t already have R and RStudio installed, follow the instructions for your operating system below. You have to install R before you install RStudio.
- Download R from the CRAN website.
- Run the
.exe
file that was just downloaded - Go to the RStudio download page
- Under Installers select Windows Vista 10/11 - RSTUDIO-xxxx.yy.z-zzz.exe (where x = year, y = month, and z represent version numbers)
- Double click the file to install it
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
- Download R from the CRAN website.
- Select the
.pkg
file for the latest R version - Double click on the downloaded file to install R
- It is also a good idea to install XQuartz (needed by some packages)
- Go to the RStudio download page
- Under Installers select Mac OS 13+ - RSTUDIO-xxxx.yy.z-zzz.dmg (where x = year, y = month, and z represent version numbers)
- Double click the file to install RStudio
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
- Click on your distribution in the Linux folder of the CRAN website. Linux Mint users should follow instructions for Ubuntu.
- Go through the instructions for your distribution to install R.
- Go to the RStudio download page
- Select the relevant installer for your Linux system (Ubuntu/Debian or Fedora)
- Double click the file to install RStudio
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
Update R and RStudio
If you already have R and RStudio installed, first check if your R version is up to date:
- When you open RStudio, your R version will be printed in the console
on the bottom left. Alternatively, you can type
sessionInfo()
into the console. If your R version is 4.2 or later, you don’t need to update R for this lesson. If your version of R is older than that, download and install the latest version of R from the R project website for Windows, for MacOS, or for Linux - It is not necessary to remove old versions of R from your system, but if you wish to do so you can check How do I uninstall R?
- After installing a new version of R, you will have to reinstall all
your packages with the new version. There are different methods for
automating the process of reinstalling packages, such as
this blog post, but you can also install packages as you find that
you need them with
install.packages()
. To update RStudio to the latest version, open RStudio and click onHelp > Check for Updates
. If a new version is available follow the instruction on screen. By default, RStudio will also automatically notify you of new versions every once in a while.
The changes introduced by new R versions are usually backwards-compatible. That is, your old code should still work after updating your R version. However, if breaking changes happen, it is useful to know that you can have multiple versions of R installed in parallel. If this is something you would like to do, we recommend using rig.
While this may sound scary, it is far more common to run into issues due to using out-of-date versions of R or R packages. Keeping up with the latest versions of R, RStudio, and any packages you regularly use is a good practice.
Install required R packages
During the course we will need a number of R packages. Packages
contain useful R code written by other people. We will use the
tidyverse
package.
To try to install these packages, open RStudio and copy and paste the following command into the console window (look for a blinking cursor on the bottom left), then press the Enter (Windows and Linux) or Return (MacOS) to execute the command.
R
install.packages("tidyverse")
Alternatively, you can install the packages using RStudio’s graphical
user interface by going to Tools > Install Packages
and
typing the names of the package.
R tries to download and install the packages on your machine.
When the installation has finished, you can try to load the packages by pasting the following code into the console:
R
library(tidyverse)
If you do not see an error like
there is no package called ‘...’
you are good to go!
Updating R packages
Generally, it is recommended to keep your R version and all packages
up to date, because new versions bring improvements and important
bugfixes. To update the packages that you have installed, click
Update
in the Packages
tab in the bottom right
panel of RStudio, or go to
Tools > Check for Package Updates...
. Another way to
update packages is to run update.packages()
in the
console.
You should update all of the packages required for the lesson, even if you installed them relatively recently.
Sometimes, package updates introduce changes that break your old
code, which can be very frustrating. To avoid this problem, you can use
a package called renv
. It locks the package versions you
have used for a given project and makes it straightforward to reinstall
those exact package version in a new environment, for example after
updating your R version or on another computer. However, the details are
outside of the scope of this lesson.
Download the data
We will download the data directly from R during the lessons. However, if you are expecting problems with the network, it may be better to download the data beforehand and store it on your machine.
The data files for the lesson can be downloaded manually here: https://doi.org/10.6084/m9.figshare.1314459