Beginners Guide for R through Open Data Science Practices: Glossary

Key Points

Introduction
  • Tidy data principles are essential to increase data analysis efficiency and code readability.

  • Using R and RStudio, it becomes easier to implement good practices in data analysis.

  • I can make my workflow more reproducible and collaborative by using git and Github.

R & RStudio, R Markdown
  • R and RStudio make a powerful duo to create R scripts and R Markdown notebooks.

  • RStudio offers a text editor, a console and some extra features (environment, files, etc.).

  • R is a functional programming language: everything resolves around functions.

  • R Markdown notebook support code execution, report creation and reproducibility of your work.

  • Literate programming is a paradigm to combine code and text so that it remains understandable to humans, not only to machines.

Visualizing data with ggplot2
  • ggplot2 relies on the grammar of graphics, an advanced methodology to visualise data.

  • ggplot() creates a coordinate system that you can add layers to.

  • You pass a mapping using aes() to link dataset variables to visual properties.

  • You add one or more layers (or geoms) to the ggplot coordinate system and aes mapping.

  • Building a minimal plot requires to supply a dataset, mapping aesthetics and geometric layers (geoms).

  • ggplot2 offers advanced graphical visualisations to plot extra information from the dataset.

Data transformation with dplyr
  • The filter() function subsets a dataframe by rows.

  • The select() function subsets a dataframe by columns.

  • The mutate function creates new columns in a dataframe.

  • The group_by() function creates groups of unique column values.

  • This grouping information is used by summarize() to make new columns that define aggregate values across groupings.

  • The then operator %>% allows you to chain successive operations without needing to define intermediary variables for creating the most parsimonious, easily read analysis.

Version control with git
  • In a version control system, file names do not reflect their versions.

  • git acts as a time machine for files in a given repository under version control.

  • git allows you to test changes and discard them if not relevant.

  • A new RStudio project can be smoothly integrated with git to allow you to version control scripts and other files.

Collaborating with you and others with Github
  • Github allows you to synchronise work efforts and collaborate with other scientists on (R) code.

  • Github can be used to make custom website visible on the internet.

  • Merge conflicts can arise between you and yourself (different machines).

  • Merge conflicts arise when you collaborate and are a safe way to handle discordance.

  • Efficient collaboration on data analysis can be made using Github.

Become a champion of open (data) science
  • Make your data and code available to others

  • Make your analyses reproducible

  • Make a sharp distincion between exploratory and confirmatory research

Glossary

FIXME