In this tutorial, I’ll cover how to analyze repeated-measures designs using 1) multilevel modeling using the lme package and 2) using Wilcox’s Robust Statistics package (see Wilcox, 2012). In a repeated-measures design, each participant provides data at multiple time points. Due to this, the assumptions about model error are different for variances which are presented between subjects (i.e., SS_{B} than are variables presented within subjects (i.e., SS_{W}. After the within-subject variability is partialled out, we model separately the effect of the experiment (i.e., SS_{E} and the error not account for by the experiment (i.e., SS_{R}).
When using this tutorial, there are a few things to keep in mind:
This is a draft. I’ll be updating this page with more graphs and explanations as time allows, informed by your feedback.
Multilevel models and Robust ANOVAs are just a few of the ways that repeated-measures designs can be analyzed. I’ll be presenting the multilevel approach using the nlme package because assumptions about sphericity are different and are less of a concern under this approach (see Field et al., 2012, p. 576).
One way repeated measures
The first dataset we’ll be using can be obtained from the Personality Project:
Reading the dataset
12
data <- read.table(file ="http://personality-project.org/r/datasets/R.appendix3.data", header =T)
The main research question is does the valence of the word affect the rate at which items are recalled? First, let’s take a look at descriptive statistics of the dataset. We can sort them by the item valence using the describeBy() function in the psych package, which is available on CRAN.
Generating descriptive statistics by a group
12345
# Load the psych packagelibrary(psych)# Describe the Recall variable by the Valence variabledescribeBy(data$Recall, group = data$Valence)
Generating descriptive statistics by a group
1234567891011
## group: Neg## var n mean sd median trimmed mad min max range skew kurtosis se## 1 1 5 27.8 3.9 29 27.8 4.45 22 32 10 -0.39 -1.72 1.74## -------------------------------------------------------- ## group: Neu## var n mean sd median trimmed mad min max range skew kurtosis se## 1 1 5 11.6 2.7 12 11.6 2.97 8 15 7 -0.09 -1.83 1.21## -------------------------------------------------------- ## group: Pos## var n mean sd median trimmed mad min max range skew kurtosis se## 1 1 5 40 3.81 40 40 2.97 35 45 10 0 -1.78 1.7
Graphing the Effect
We can generate a quick boxplot to display the effect of Valence on Recall using the ggplot2 package from CRAN.
A multilevel model is simply a regression that allows for the errors to be dependent on eachother (as our conditions of Valence were repeated within each participant). To run this type of analysis, we’ll use the nlme package from CRAN, although I’ve also had good luck with the lme4 package if you like experimenting.
Installing nlme
1
install.packages('nlme')
Similar to any approach to model testing, we want to see if our predictive, augmented model is better than a simple, 1 parameter mean model. Thus, we begin by specifying a baseline model in which the DV, Recall, is predicted by its overall mean. Second, we specify our model of interest, in which Recall is predicted instead by a the item Valence, which was repeated within subjects.
Specifying our 2 Models
123456789
# Load the nlme packagelibrary(nlme)# Compact Modelbaseline <- lme(Recall ~1, random =~1| Subject/Valence, data = data, method ="ML")# Augmented ModelvalenceModel <- lme(Recall ~ Valence, random =~1| Subject/Valence, data = data, method ="ML")
One way of assessing the significance of our model is by comparing it from the baseline model. By comparing the models, we ask whether Valence as a predictor is significantly better than the simple mean model (i.e., a better fit). We can do this with the anova() function.
Comparing the Models
1
anova(baseline, valenceModel)
Comparing the Models
123
## Model df AIC BIC logLik Test L.Ratio p-value## baseline 1 4 125.24 128.07 -58.62 ## valenceModel 2 6 84.36 88.61 -36.18 1 vs 2 44.87 <.0001
The output contains a few indicators of model fit. Generally with AIC (i.e., Akaike information criterion) and BIC (i.e., Bayesian information criterion), the lower the number the better the model, as it implies either a more parsimonious model, a better fit, or both. The likelihood ratio indicates that our valenceModel is a significnatly better fit for the data than our baseline model (p < 0.0001). Therefore, the item Valence had a significant impact on the measured Recall of the participant, Χ^{2}(2) = 44.87, p < 0.0001.
We can obtain more specific details about the model using the summary() function:
## ## Simultaneous Tests for General Linear Hypotheses## ## Multiple Comparisons of Means: Tukey Contrasts## ## ## Fit: lme.formula(fixed = Recall ~ Valence, data = data, random = ~1 | ## Subject/Valence, method = "ML")## ## Linear Hypotheses:## Estimate Std. Error z value Pr(>|z|) ## Neu - Neg == 0 -16.20 1.31 -12.36 <2e-16 ***## Pos - Neg == 0 12.20 1.31 9.31 <2e-16 ***## Pos - Neu == 0 28.40 1.31 21.67 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## (Adjusted p values reported -- single-step method)
Thus, our post hoc analysis shows that participants’ rate of recall was significantly better for positively valenced items (M = 40) than neutral (M = 11.6, b = 28.40, p < .0001) and negatively valenced items (M = 27.8, b = 12.20, p < .0001). Similarly, neutral items were recalled at a significantly higher rate than negatively valenced items (b = -16.20, p < .0001).
Wilcox’s Robust ANOVA
As of 5/1/13, the WRS package must be compiled from source to be installed. You can obtain the source package from the R-Forge repo below:
Unlike using lme() to analyze the data as a multilevel model, rmanova() requires that the data are in wide format. To adjust our table, we’ll use the reshape2 package from CRAN and cast the data into a wide format.
For some reason, the rmanova() function doesn’t like dealing with factors variables, so we’ll remove the 5 Subjects. Finally, we’ll use rmanova(), which trims the data by 20% before estimating the effect.
Similar to our findings from above, Valence had a significant influence on the item recall rate of the participant, F(1.26, 2.52) = 154.66, p < .01. However, we still want to conduct post-hoc analysis on the 20% trimmed means, which we’ll do using the rmmcp() function.
Post-hoc analysis confirms that Negatively valenced items are significantly different from both Neutral (Ψ̂ = 16, p < .01) and Positive items (Ψ̂ = -13, p < .05). Additionally, Neutral items are significantly different from positive items (Ψ̂ = -28.33, p < .01).
Two way repeated measures (In Progress)
The second dataset we’ll be using can be obtained from the Personality Project:
# Load the psych packagelibrary(psych)# Describe the Recall variable by the Valence variabledescribeBy(data.2$Recall, group = data.2$Valence)
Generating descriptive statistics by Valence
1234567891011
group: Neg
var n mean sd median trimmed mad min max range skew kurtosis se
1110114.51211.125.9341713-0.16-1.641.42---------------------------------------------------------------------------------------------------------------------group: Neu
var n mean sd median trimmed mad min max range skew kurtosis se
111012.13.5113125.19718110.12-1.411.11---------------------------------------------------------------------------------------------------------------------group: Pos
var n mean sd median trimmed mad min max range skew kurtosis se
111012.34.081312.253.71520150.06-0.621.29
Generating descriptive statistics by Task
12345
# Load the psych packagelibrary(psych)# Describe the Recall variable by the Valence variabledescribeBy(data.2$Recall, group = data.2$Tasj)
Generating descriptive statistics by Task
1234567
group: Cued
var n mean sd median trimmed mad min max range skew kurtosis se
111512.84.481412.925.9342016-0.27-1.071.16---------------------------------------------------------------------------------------------------------------------group: Free
var n mean sd median trimmed mad min max range skew kurtosis se
111510.83.191210.922.9751510-0.45-1.360.82
p <- ggplot(data.2, aes(Valence, Recall))p + geom_boxplot(aes(fill = Task))
Setting Contrasts
We begin by setting up orthogonal contrasts for our Task and Valence factors.
By setting contrasts, we make the main effects of our ANOVA results more intepretable.
Model df AIC BIC logLik Test L.Ratio p-value
baseline 15151.9240158.9300-70.96201valenceModel 27153.4414163.2498-69.720691 vs 22.4826320.2890taskModel 36144.9022153.3094-66.451122 vs 36.5391420.0106fullModel 48145.7924157.0020-64.896213 vs 43.1098170.2112interactionModel 510149.2138163.2258-64.606924 vs 50.5785840.7488
Our taskModel, which includes the main effect of Task, is the preferred significant model (p = .011).
the Valence-only model was not significant, nor was our interaction model, which included an interaction term for
Valence and Task, indicating that the Valence of the word had no effect on participants’ recall.