This lesson is in the early stages of development (Alpha version)

Intro to R and Open Science Practices for Biologists

The workshop is organised by the Open Science Community Saudi Arabia using materials from intro to R and RStudio for Genomics and Introduction to Open Data Science with R to sFDA staff. It was delivered by the following instructors and helpers: Dr. Monah Abou Alezz, Dr. Batool Almarzouq, Annajiat Alim Rasel, Mona Alsharif and Abdulrahman Dallak​​​​​​​.

Welcome to R!

Working with a programming language (especially if it’s your first time) often feels intimidating, but the rewards outweigh any frustrations. An important secret of coding is that even experienced programmers find it difficult and frustrating at times – so if even the best feel that way, why let intimidation stop you? Given time and practice* you will soon find it easier and easier to accomplish what you want.

Why learn to code?

Bioinformatics – like biology – is messy. Different organisms, different systems, different conditions, all behave differently. Experiments at the bench require a variety of approaches – from tested protocols to trial-and-error. Bioinformatics is also an experimental science, otherwise we could use the same software and same parameters for every genome assembly. Learning to code opens up the full possibilities of computing, especially given that most bioinformatics tools exist only at the command line. Think of it this way: if you could only do molecular biology using a kit, you could probably accomplish a fair amount. However, if you don’t understand the biochemistry of the kit, how would you troubleshoot? How would you do experiments for which there are no kits?

R is one of the most widely-used and powerful programming languages in bioinformatics. R especially shines where a variety of statistical tools are required (e.g. RNA-Seq, population genomics, etc.) and in the generation of publication-quality graphs and figures. Rather than get into an R vs. Python debate (both are useful), keep in mind that many of the concepts you will learn apply to Python and other programming languages.

Finally, we won’t lie; R is not the easiest-to-learn programming language ever created. So, don’t get discouraged! The truth is that even with the modest amount of R we will cover today, you can start using some sophisticated R software packages, and have a general sense of how to interpret an R script. Get through these lessons, and you are on your way to being an accomplished R user!

The slides associated with the workshop can be accessed here:

Monah Abou Alezz, Batool Almarzouq, Annajiat Alim Rasel, Abdulrahman Dallak, & Mona Alsharif. (2022, November 2). Intro to R and Open Science Practices for Biologists (Arabic). Zenodo. https://doi.org/10.5281/zenodo.7274156

* We very intentionally used the word practice. One of the other “secrets” of programming is that you can only learn so much by reading about it. Do the exercises in class, re-do them on your own, and then work on your own problems.

Before you start

  • Before the training: Please make sure you have completed the Setup instructions.
  • There are two options to install necessary softwares and packages:
    • (Option 1) The favored one is to sign up to RStudio Cloud.
    • (Option 2) Alternatively, you can install everything manually. Follow the Setup instructions to do so.
  • Please read the workshop Code of Conduct to make sure this workshop stays welcoming for everybody.

  • Experimenter’s Mindset: We define the “Experimenter’s mindset” as an approach to bioinformatics that treats it like any other experiment. There are probably a variety of metaphors we could employ (data are our reagents, scripts are our protocols, etc.), but the most important idea of the mindset is to remind you that as a researcher, you need to employ all of your training in the bench or field to working with analyses. Evaluate results critically, and don’t expect that things will always work the first time, or that they will always work in the same way.

Schedule

Setup Download files required for the lesson
09:00 1. Why care about open (data) science? What is Open Science?
What is Open and Reproducible Research Practices?
09:50 2. Introducing R and RStudio IDE Why use R?
Why use RStudio and how does it differ from R?
10:35 3. Collaborating with Github How can I develop and collaborate on code with another scientist?
How can I give access to my code to another collaborator?
How can I keep code synchronised with another scientist?
How can I solve conflicts that arise from that collaboration?
What are Github
12:05 4. R Basics What will these lessons not cover?
What are the basic features of the R language?
What are the most common objects in R?
13:25 5. Introduction to the example dataset and file type What data are we using in the lesson?
What are VCF files?
13:40 6. R Basics continued - factors and data frames How do I get started with tabular data (e.g. spreadsheets) in R?
What are some best practices for reading data into R?
How do I save tabular data generated in R?
15:10 7. Using packages from Bioconductor How do I use packages from the Bioconductor repository?
15:23 8. Data Wrangling and Analyses with Tidyverse How can I manipulate data frames without repeating myself?
16:18 9. Data Visualization with ggplot2 What is ggplot2?
What is mapping, and what is aesthetics?
What is the process of creating a publication-quality plots with ggplot in R?
17:48 10. Producing Reports With knitr How can I integrate analyses and reports?
19:03 11. Getting help with R How do I get help using R and RStudio?
19:18 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.