R is an extremely powerful and versatile statistical environment. With this course, students will receive a crash course on handling R, including data in- and output, data manipulation and conducting basic statistical analysis and visualisation. Emphasis is being put on including R in typical social science work flows and comparing R critically with SPSS and Stata. Topics covered include operating R using RStudio, interfacing R with popular data formats, inference statistics with (un) weighted data and data/result visualization. All sessions will include demonstrations and hands-on exercises. Students are expected to have a sound understanding of statistical principles, to be computer literate and comfortable with the idea of using a computer program without a point-and-click interface. After completing this course, students should feel confident enough to further explore the world of R on their own.

For many years, the main purpose of the R statistical environment was to enable statisticians to exchange ideas and methods. These days have passed. Today, R has matured in a very powerful and versatile program, that can (and indeed is) being used for any kind of statistical analysis. R's key advantage is that it is free. Free as in free beer (which saves anyone who masters R the prohibitively high licensing costs of commercial statistical software) and free as in freedom. This latter free allows statisticians, data scientists and other practitioners to extend and modify R in any way they see fit. At the time of this writing, R stands at version 3.0.1 and offers more than 6,000 extension packages. While R is extremely popular in life sciences, natural sciences, engineering and finance, the social sciences are still refraining from widespread use. The reasons for this are manifold, but R's reputation for being hard to learn is certainly featuring prominently among them. While the last five years have seen a vast improvement of R interfaces and usability, that image has remained so far. This course sets out to correct that image.

As a short course, this course cannot provide students with a comprehensive let alone exhaustive introduction to R. Rather; this course is intended to serve as an appetizer for students seeking to find an efficient and powerful tool to conduct statistical analyses with. Therefore, in this course we will focus on the basics of using R for fundamental statistical analyses for social sciences. Advanced topics and methods will not be covered. However, pointers to literature, packages and help as well as guidance with respect to more advanced topics will be given.

This course begins with a gentle introduction to the R Studio interface with R and basic syntactic operations. We will then proceed on to R's different data types and their peculiarities. As R's language logic (a functional programming language) and terms differ radically from less powerful but more widespread concepts like those used in SPSS or Stata or even Excel, sufficient time will be spent on these subjects, to lay strong foundations for the things yet to come.

Once data manipulation is being mastered, attention will turn to using basic inference statistical concepts on simple data sets. Methods covered range from descriptive statistics, to contingency tables and associated hypothesis tests. This section also includes basic data visualization. Here we will also cover importing data sets that are typically found in social science contexts: SPSS' sav, Stata's dta and Excel's xls.

In social sciences, we can hardly ever work with simple data sets. Often enough data is collected using surveys that need to be corrected for bias or sampling was done at multiple levels. In either case, statistics mandates to compensate for the resulting modification of variance and inclusion probability. R is well equipped to handle all of these cases. Unfortunately, hardly any introductory courses in statistical software cover this topic. To empower social scientists to use their knowledge even in situations where the classical statistical assumption of simple random sampling is violated, we will devote ample time to this topic.

Each session in this course will consist of hands-on practical training using demos and in-class assignments, paired with presentation slides and a comprehensive set of lecture notes. The idea here is to complement instruction with reference material students can use to expand on their gained knowledge after WSMT has ended. Instruction-wise, most sessions will contain best-practice examples, so that students can optimize their own workflows.

Note: this is not a methods course, so while we will use a large array of methods, we will spend little to no time on how or why methods work, and instead focus fully on how to use them with R. Note further, that due to the new two day format, this course will be very demanding and expect to spend considerable amounts of time for reinforcing the topics covered in class.

To ensure that all students get the most out of the course, a sound understanding of inferential statistics is being assumed. If needs be, students should review their knowledge before the beginning of the Winter School. Likewise, all students must be confident in using a (their) computer.

Top


Day-to-day schedule

Topics Details

Introduction to R: R Syntax, RStudio handling, Reading/writing data, SPSS/Stata comparison

Lab session: lecture intertwined with hands-on demos and exercises

Basic Statistics inference statistics & linear regression with unweighted and weighted data

Outlook on things not covered, open floor

Lab session: lecture intertwined with hands-on demos and exercises

Open floor: Students are to present their own research problems and we'll figure out how to tackle them in R together.

Top


Assignments

For this course, there are no required readings. Instead, students are expected to do small exercises (30-60 minutes per day) and present them to the class next day. The readings mentioned are optional and all either in the book R in Action or on this website with the respective session sections.

Readings (all optional)
Day 1

R Tutorials online, Chapters 1, 2, and 4

Day 2

Complex samples materials on website, Chapters 6, 7, and 8

Top