HTTPS SSH

Repository for ESME #rstats course

You landed on the README page...

This repository included various pieces of R code and Markdown documents that will be used during the course. Of particular interest, there are the slides and code that hold relevant parts of the on-line course. Slides are available in PDF and Markdown format. The latter can be displayed on-line on Bitbucket. The PDF slides are rendered using Pandoc + LaTeX (see the Makefile).

You can clone this repository on your personal computer using, e.g.:

git clone https://chlalanne@bitbucket.org/chlalanne/rstats-esme.git

Syllabus

Here is the updated schedule, but see the course overview:

  1. Nov, 7: R basics
  2. Nov, 14: Graphics with ggplot2
  3. Nov, 15*: R dplyr and data.table
  4. Nov, 28: Data Mining Part 1 + Practical 1
  5. Dec, 5: Data Mining Part 1 (Con't)
  6. Dec, 12: Cancelled
  7. Dec, 19: Data Mining Part 2 + Practical 2

Each session will run for 3 hours, including a 1-hour hand-on practical. Solutions will be available in the code folder. Additional lectures are listed below.

  • Nov, 14-15: ISLR §2.1 and R4DS §3 and §7
  • Nov, 28: ISLR §10.2-10.3 and R4DS §13
  • Dec, 5: ISLR §3.6, 5.1-5.2 and R4DS §23-25
  • Dec, 12: R4DS §27
  • Dec, 19: ISLR §8.2, 9.2-9.3

R4DS = R for Data Science ISLR = Introduction to Statistical Learning

Software

  • R and Rstudio
  • a decent text editor (Emacs, Vi(m), Sublime, Atom, VS Code)
  • Git

Final assessment

In addition to the two practicals, there is a final evaluation.

Project (3CI)

The project will consist in developing a recommender system based on a small to moderate dataset. You will have to compute custom numerical weights based either on text-based user review or numerical ratings, then apply an UCBF evaluation scheme and assess the predictive ability of the model. Detailed instructions are available on the project page.

Exam (3CB)

The on-site assessment will last 2 hours. You are allowed to use any document, including the slides and PDFs listed on this page. The exam will consist in a series of questions including: review of R code, small numerical applications (no calculator required), and general questions about the content of the course itself as well as R4DS and ISLR textbooks, and chapters recommended for the "data mining" slides (5 & 6).