An introduction to the R programming language

Course Outline

Objectives

In 2012, the Harvard Business Review profiled the Data Scientist as “ the Sexiest Job of the 21st Century.” Whether you believe it or not, the analysis of job figures are still pointing to an increasing demand for competencies in data management and analysis in the market. This is also felt in academia, justified by the growing availability of data sources and the spread of quantitative researches.

This short preamble to say that there is no better moment to acquire these valued skills. Ideally, one should approach quantitative data analysis with an open source software tool, which should be effective at:

  1. connecting, importing, manipulating and exporting data from different sources/formats

  2. providing a wide range of existing tools able to tackle virtually any quantitative analysis

  3. adapting to your own needs, by allowing the customization of functions

  4. offering help to solve your problems through a thriving users community

R provides just that. The R language is a cross-platform, open-source statistical programming language, widely used in all fields of science.

The present course offered within the Master in Digital Transformation is aimed at providing the necessary skills to tackle, within the R environment, basic and intermediate data analysis tasks. After completing this course, the student will be able:

  • to set up a basic data analysis project in R and import data;

  • to explore, represent and summarize distinct data structures;

  • to use select statistical applications in R (t-test and OLS regression)

  • to visualize and report the result of the analysis.

Useful applications

R is the fastest growing software in the scholarly world. It is an indispensable tool in in any academic or non-academic research using quantitative data and statistical analysis. There are very strong reasons to start learning R as your first programming language.

Specific requirements to attend the course

Attendance of previous courses in descriptive and inferential statistics is not required. Students should bring their laptops because the course is hands-on and interactive. I recommend installing R and R studio before the first class. Both software are freely available.

Syllabus and day-to-day schedule

Date Session Time Topic
14/4 1 14.30-18.30 Presentation of R and Rstudio, Importing data
12/5 2 14.30-18.30 Data preparation and manipulation based on the dplyr library
19/5 3 14.30-18.30 Descriptive statistics and visualisation using the ggplot2 library
  • Wickam, Hadley and Garrett Grolemund (2017). R for data science (versione italiana) https://it.r4ds.hadley.nz/.

  • Imai, Kosuke (2018). Quantitative Social Science. Princeton University Press.

  • Kabacoff, Robert (2011). R in Action. Greenwich, CT: Manning Publications.

Enrico Borghetto
Enrico Borghetto
Associate Professor
Previous