## Making easier and better graphs in R
## Eric Guntermann
## October 2, 2013
## This workshop is on how to make easy and effective graphs for presenting the results of
## regression models in R. It covers opening datasets, running models, and making
## graphs using John Fox's effects package. It shows how to do so with linear models,
## logistic regression models, multinomial logistic regression models, and ordinal logistic
## regression models.
## Before we begin, it's important to understand some basic concepts in R. R is an object-
## oriented language. Objects are computer creations that store information. R also uses
## commands, which are performed on objects, in turn creating other objects. Another
## important concept is that of packages. Commands in R are contained in packages, some
## of which are included by default in R, some of which must be downloaded. In this
## workshop, we will use the effects package, which insn't installed by default.
## To install it, type:
install.packages("effects")
## Let's also install the car package, which contains some useful statistical commands.
install.packages("car")
## We will also use the foreign package, which is used to open data files from other
## statistical packages, such as STATA and SPSS. the foreign package is included in the
## R base installation.
## To begin, let's open our two datasets. First, we have to load the foreign package
## using the library function. Then we run the read.dta command, used for opening STATA
## files, on the name of our data file. We then assign the output of the command to
## the object qc12. The data is the 2012 Makind Electoral Democracy Work (MEDW) survey
## done for the Quebec provincial elections.
library(foreign)
## To read in the data you have to set the working directory to the directory containing
## the dataset
setwd("/where/file/is")
qc12 <- read.dta("qc12.dta")
## If this file were in SPSS format, you would type the following:
## qc12 <- read.spss("qc12.spss", to.data.frame=TRUE, use.value.labels=TRUE)
## Now that we've imported our datasets, let's run some models.
## 1. Continuous dependent variable
## Here our dependent variable is an ideology scale. We run a linear model.
mod1 <- lm(ideology ~ age + sex + french + education + log(income), data=qc12)
## Anova performans an F test on our regression output. An F test allows us to appropriately
## test hypotheses on all our coefficients, including education, which involves 9 dummy
## variables. Anova comes with the car package. Let's load it.
library(car)
Anova(mod1)
## We see that education and log(income) have significant effects
## To visualize the effects of our variables, we first have to load the effects package.
library(effects)
## Now, let's look at the effect of log(income) on ideology. We saw that it has a significant
## effect, In one short line, we can create a quite informative graph.
plot(effect("log(income)", mod1))
## By default, effect includes a rug plot, which is not very informative for this variable.
## Let's remove it, by using the rug=FALSE option.
plot(effect("log(income)", mod1), rug=FALSE)
## The effects package, by default, holds other variables constant at interesting values.
## We should probably specify these just to be sure what we're looking at.
plot(effect("log(income)", mod1), rug=FALSE, given.values=c("age"="mean(age)",
"sex"="Female", "french"="TRUE", "education"="Bachelor's degree"))
## If our variable of interest is a dummy variable, effects can make a similarly interesting graph
## university is a dummy variable coded TRUE if people have at least some university
mod2 <- lm(ideology ~ age + sex + french + university + income, data=qc12)
Anova(mod2)
plot(effect("university", mod2), rug=FALSE, given.values=c("age"="mean(age)",
"sex"="Female", "french"="TRUE", "income"="mean(log(income))"))
## Interactions are not much harder to present. Consider the interaction between
## french and log(income)
mod3 <- lm(ideology ~ sex + french*log(income) + education, data=qc12)
Anova(mod3)
plot(effect("french*log(income)", mod3), rug=F, given.values=c("sex"="Female",
"education"="Bachelor's degree"))
## We see that income has a weaker effect on Francophones than on Anglophones.
## However, we might want to look at the other side of the interaction,
## i.e. how income conditions the effect of language. To do that we have to
## create the graph in two lines.
eff_freinc <- effect("french*log(income)", mod3)
## Remember how R is about applying commands to objects and producing other objects.
## Previously we applied two commands in one line. Here we just split it up in two, which
## is necessary to pick the variable we want to display.
plot(eff_freinc, x.var="french", given.values=c("sex"="Female", "education"="Bachelor's degree"))
## Binary dependent variable
## Here our dependent variable is a binary variable representing support for Quebec sovereignty.
mod4 <- glm(sov ~ income + french, data=qc12, family=binomial)
Anova(mod2)
## Let's look at the effect of income
plot(effect("income", mod4), rug=FALSE, given.values=c("french"="French"))
## Now, how about the effect of language?
plot(effect("french", mod4), rug=FALSE, given.values=c("income"="mean(income)"))
## We can also interact these two variables
mod5 <- glm(sov ~ income*french, data=qc12, family=binomial)
Anova(mod5)
plot(effect("income*french", mod5), rug=FALSE)
## The effect of income seems to be a lot weaker for Francophones
## Nominal dependent variable
## Now we use multinomial logistic regression to look at the effect of
## language and ideology on vote choice, including the three largest parties in Quebec in
## 2012.
## The command for multinom is in the nnet package, which we have to load, since it isn't
## loaded by default.
## To begin, let's look at the effect of a dichotomous variable, language, on vote choice.
library(nnet)
mod6 <- multinom(vote ~ sex + french, data=qc12)
Anova(mod6)
plot(effect("french", mod6), rug=FALSE, given.values=c("sex"="Female"))
## Now for the effect of a continuous independent variable, ideology.
mod7 <- multinom(vote ~ ideology + sex + french, data=qc12)
Anova(mod7)
plot(effect("ideology", mod7), rug=FALSE, given.values=c("sex"="Female", "french"="TRUE"))
## Now let's see if university conditions the effect of ideology on vote choice,
## by interacting our university dummy with the ideology scale.
mod8 <- multinom(vote ~ university*ideology + french + sex, data=qc12)
Anova(mod8)
plot(effect("university*ideology", mod8), rug=FALSE, given.values=c("french"="TRUE", "sex"="Female"))
## Ordinal depednent variable
## Here we look at the effect of income on satisfaction with democracy in Quebec,
## an ordinal variable with three levels
library(MASS)
mod9 <- polr(satdem ~ age + french + income, data=qc12)
Anova(mod9)
plot(effect("income", mod9), rug=FALSE, given.values=c("french"="TRUE", "age"="mean(age)"))
#Another way to see this is using a stacked graph, although it doesn't give
## confidence bounds
plot(effect("income", mod9), style="stacked", rug=F, given.values=c("french"="TRUE", "age"="mean(age)"))
## We can also look at the effect of a dichotomous variable. Here, let's look at income in
## two categories
mod10 <- polr(satdem ~ age + french + income_2cat, data=qc12)
Anova(mod10)
plot(effect("income_2cat", mod10), rug=FALSE, given.values=c("french"="TRUE", "age"="mean(age)"))
## That's all for today. If you have any quesitons or need any help, don't hesitate to ask me.