For these courses, we will use a toolchain of programs that work in concert to fulfill our every statistical need. This might be quite different from a typical workflow where one program is used for calculating (e.g. SPSS) and another one for presenting the output (e.g. Office). In the R toolchain everything starts with R. On top of R, it is often sensible to use a user interface that makes working with R more comfortable, such as e.g. RStudio. This minimal tool chain will be what we use in the short course.

While copy pasting results from R into Word is definitely possible, there are much more convenient ways of generating output. The long course will go into greater detail here, and hence need complementary software. The program we'll use to create statistical reports where text and calculations are kept in the same place is called pandoc. This will allow us to create reproducible research, that others can replay to learn or extend. Talking of others, social science is getting increasingly more team-work oriented. This means, doing data cleaning, statistical analysis and report writing in collaboration with others requires us to rethink our workflow. R is here to help and integrates happily with version control systems. We will be using git for that purpose.

There is a great many different setups out there, so we cannot provide help with every possible combination. What we do can, however, is provide guidance and point you to websites where you'll find more information. We'll do that in what is called installation instructions below.

The Core: R Short Course Long Course

R is more than just a statistics program. R is an environment. That means, you are not just using a point-and-click interface to do analysis, but you write code (syntax in SPSS terminology) and execute it in a console. This console will also show you immediately your results. Then, there is another way of using R. There we write script files that contain R code and execute them whenever needed, saving results or generating reports with them. In the short course we only have time to discuss the interactive usage and only briefly touch upon scripting. In the long course, scripting will be our main mode of operation.

R is designed in a very modular way. The R core contains only basic functionality. However, using the R language it is possible to program everything from the most cutting edge statistical methods to web servers to computer games. At the time of this writing, there are more than 6,700 extension packages to R. We are going to be needing quite a few of them.

Oh, and best of all: R is free. Free as in free beer and as in free speech. R is being developed by hundreds of statisticians, programmers, and applied scientists all over the world. The results of this massive collaboration are accessible for free. Now and always. The community is growing rapidly and in due course, you will join us.


The Interface: RStudio Short Course Long Course

Because R is only the core, it comes with a very limited user interface. There are many alternatives. We are going to be using an interface called RStudio that strikes a good balance between features and learning curve. You'll feel at home with it in no time. RStudio's advantage is its tight integration/enhancement of R's main features: integrated help browser, coloring and autocompleting the code you write to make things easier, save plots you produce, project management.


Complementary Software: pandoc, git Long Course

There are two pieces of fine software we are going to employ to make full use of R's potential. One, pandoc, is a document converter and the other one, git, helps you to keep your research organized and shareable.


John MacFarlane's pandoc is a real jack of all trades. It converts markdown (a very easy, R-integrated markup language that we are going to use) into almost anything: LaTeX, PDF, Word, ... This versatility makes it the core of our reproducible research efforts. By using markdown to write reports with embedded calculations we keep the numbers and the stories in one place. Later on, pandoc allows us to convert these pandoc things into text processing files you can style as you wish.


Did you ever try out shortcuts in your work, that then did not function? Like, I just change this formatting, real quick and then nothing works anymore? Well, git helps you with that. By providing a framework for storing (and distributing) the code (remember stories and numbers are together from now on) you write. Every version. Always. This allows you to go back to a previous state, if something did not work out. So, never stop exploring.


Installation Instructions Short Course Long Course

These instructions cover the installation of R and RStudio on a Computer with MS Windows XP. Other Versions of Windows and Mac OS X should be similar. We will first install the R core, then install R Studio and finally download and install all the packages we are going to need.

Installing R

  1. Download the R installer from the CRAN server.
  2. Execute the downloaded file to install R. The default choices in the installer should make sense for most people.
  3. Launch Rgui.exe, that should have been installed in the step above, to make sure everything worked out fine.

Installing RStudio

  1. Download the RStudio installer from the RStudio server.
  2. Execute the downloaded file to install RStudio. The default choices in the installer should make sense for most people.
  3. Launch the RStudio interface.
  4. From the Tools menu, select Check for Package Updates.... In the window that just appeared, select all packages that are upgradeable and upgrade them. Should Windows ask you if you want to permit R, RTerm or RStudio to access the internet, kindly grant that permission. In the bottom left pane of the main window, you will then see how R is first downloading and then installing new package versions. It is done, once the little STOP icon there disappears.
  5. This concludes the installation of RStudio

Installing R Packages

We are going to use quite a few additional packages. These need to be installed, so that their functionality is available when we need it. The packages we will need are survey, table1xls, xlsx, sjPlot.
  1. From the Tools menu, select Install Packages.... In the Packages field, enter: survey,table1xls,xlsx,sjPlot. Make sure that the checkbox labeled Install dependencies is actually checked.
  2. Watch again as R installs the packages (and their dependencies, which can take quite some time).
  3. When R is done, type this code into the Console pane: source('', echo=FALSE) and hit enter. If this returns [1] TRUE package installation worked fine. If not, cry for help. :-/
  4. Once installation worked out fine, quit R Studio. You are now good to go for the short course.


Additional Installation Instructions Long Course

The long course requires quite a lot of additional software. On top of the things we installed before, we are going to need pandoc and git. However, pandoc can make use LaTeX if it is installed. I know that installing pandoc, LaTeX and git under Windows can be quite a challenge. So installing this additional software is optional. But I encourage you to try, especially pandoc, as the results can be very rewarding. We will also use many more packages than in the short course, but we can install them once we get to the respective chapters.

Installing pandoc

  1. Head over to John MacFarlane's Installing pandoc, download the appropriate installer and execute it.

Installing LaTeX

  1. Try installing MiKTeX for Windows or BasicTeX for Mac OS X.

Installing git

  1. Check out this handy git install guide. It has sections on Linux, Mac OS X and Windows: Installing Git.