R Markdown is a program for making dynamic documents with R. An R Markdown document is written in markdown, an easy-to-write plain text format with the file extension .Rmd. It can contain chunks of embedded R code. It has a number of great features:
In recent years I (along with many others) who work a lot with R have made Rmarkdown the basic way to work with R. So when I work on a new project I immediately start a corresponding R markdown document.
to start writing an R Markdown document open RStudio, File > New File > R Markdown. You can type in the title and some other things.
The default document starts like this:
---
title: “My first R Markdown Document”
author: “Dr. Wolfgang Rolke”
date: “April 1, 2018”
output: html_document
---
This follows a syntax called YAML (also used by other programs). There are other things that can be put here as well, or you can erase all of it.
YAML stands for Yet Another Markup Language. It has become a standard for many computer languages to describe different configurations. For details go to yaml.org
Then there is other stuff you should erase. Next File > Save. Give the document a name with the extension .Rmd
for a list of the basic syntax go to
https://rmarkdown.rstudio.com/articles_intro.html
or to
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
There are two ways to include code chunks (yes, that’s what they are called!) into an R Markdown document:
simultaneously enter CTRL-ALT-i and you will see this:
```{r}
```
Here ` is the backtick, on most keyboards on the ~ key, upper left below Esc.
you can now enter any R code you like:
```{r}
x <- rnorm(10)
mean(x)
```
which will appear in the final document as
x <- rnorm(10)
mean(x)
Actually, it will be like this:
x <- rnorm(10)
mean(x)
## [1] -0.1869992
so we can see the result of the R calculation as well. The reason it didn’t appear like this before was that I added the argument eval=FALSE:
```{r eval=FALSE}
x <- rnorm(10)
mean(x)
```
which keeps the code chunk from actually executing (aka evaluating). This is useful if the code takes along time to run, or if you want to show code that is actually faulty, or …
there are a number of useful arguments:
here is a bit of text:
… and so the mean was -0.1869992.
Now I didn’t type in the number, it was done with the chunk `r mean(x)`
.
Many of these options can be set globally, so they are active for the whole document. This is useful so you don’t have to type them in every time. I have the following code chunk at the beginning of all my Rmd files:
library(knitr)
opts_chunk$set(fig.width=6, fig.align = "center",
out.width = "70%", warning=FALSE, message=FALSE)
We have already seen the message and warning options. The other one puts any figure in the middle of the page and sizes it nicely.
If you have to override these defaults just include that in the specific chunk.
To create the output you have to “knit” the document. This is done by clicking on the knit button above. If you click on the arrow you can change the output format.
In order to knit to pdf you have to install a latex interpreter. My suggestion is to use Miktex, but if you already have one installed it might work as well.
There are several advantages / disadvantages to each output format:
I generally use HTML when writing a document, and use pdf only when everything else is done. There is one problem with this, namely that a document might well knit ok to HTML but give an error message when knitting to pdf. Moreover, those error messages are weird! Not even the line numbers are anywhere near right. So it’s not a bad idea to also knit to pdf every now and then.
As far as this class is concerned, we will use HTML exclusively.
One of the more complicated things to do in R Markdown is tables. For a nice illustration look at
My preference is to generate a data frame and the use the kable.nice function:
Gender <- c("Male", "Male", "Female")
Age <- c(20, 21, 19)
kable.nice(data.frame(Gender, Age))
Gender | Age |
---|---|
Male | 20 |
Male | 21 |
Female | 19 |
probably with the argument echo=FALSE so only the table is visible.
You have not worked with latex (read: latek) before? Here is your chance to learn. It is well worthwhile, latex is the standard document word processor for science. And once you get used to it, it is WAY better and easier than (say) Word.
A nice list of common symbols is found on https://artofproblemsolving.com/wiki/index.php/LaTeX:Symbols.
A LATEX expression always starts and ends with a $ sign. So this line:
The population mean is defined by \(E[X] = \int_{-\infty}^{\infty} xf(x) dx\).
was done with the code
The population mean is defined by $E[X] = \int_{-\infty}^{\infty} xf(x) dx$.
sometimes we want to highlight a piece of math:
The population mean is defined by
\[ E[X] = \int_{-\infty}^{\infty} xf(x) dx \] this is done with two dollar signs:
$$
E[X] = \int_{-\infty}^{\infty} xf(x) dx
$$
say you want the following in your document:
\[ \begin{aligned} &E[X] = \int_{-\infty}^{\infty} xf(x) dx = \\ &\int_{0}^{1} x dx = \frac12 x^2 |_0^1 = \frac12 \end{aligned} \]
for this to display correctly in HTML and PDF you need to use the format
$$
\begin{aligned}
&E[X] = \int_{-\infty}^{\infty} xf(x) dx=\\
&\int_{0}^{1} x dx = \frac12 x^2 |_0^1 = \frac12
\end{aligned}
$$