Poster is available upon request:
In the following tutorial, we’ll use R (R Core Team, 2013) along with the psych
package (Revelle, W., 2013) to look at a hypothetical exam.
Before we get started, remember that R
is a programming language. In the examples below, I perform operations on data using functions like cor
and read.csv
. We can also save the output as objects using the assignment arrow, <
. It’s a bit different from a pointandclick program like SPSS, but you don’t need to know how to program to analyze exams using IRT!
First, load the the psych
package. Then load the students’ grades into R using read.csv()
from the psych
package.
1 2 3 4 

Notice that we are using itemlevel grades, where each row is a given student and each cell is the number of points received on that question. Your matrix or data frame should look like this:
1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 

Next, compute the polychoric correlations on the raw grades (not including the Total column). By using polychoric correlations, we estimate the normal distribution of latent content knowledge, which can be underestimated if Pearson correlations are instead used on polytomous items.
1 2 

Now that we have the polychoric correlations, we can run irt.fa()
on the dataset to see the item difficulties and information.
1 2 3 4 

1


Thus, we have some great items that have a lot of information about students of average and low content knowledge (e.g., V24, V17, V18), but not enough to distinguish the highknowledge students. In redesigning an exam for next semester or year, we might save the best performing questions while trying to rewrite the existing questions or trying new questions. At the same time, the second plot shows the test performance. We have great reliability for distinguishing who didn’t study (lowers end of our latent trait), but overall the test may have been too hard.
While many students in our hypothetical dataset did very well on the exam, instructors may
need to rescale their exam so that the mean grade is an 85% or 87.5%. Using the scale()
function (see also rescale()
in psych
package), we
can ensure that the rankorder distribution of the students is preserved (allowing us to distinguish
those who studied well), while scaling the sample distribution to fit in with other classes in your department.
Currently, our scores are in cumulative raw points. Notice that we divide the Total points column by 91 to convert the histogram into grade percentages. Let’s plot a histogram to see the distribution of scores.
1 2 3 4 5 

The distribution has a mean 71.68 percent and a standard deviation of 15.54. Given grade inflation, it may look like your students are doing poorly when in fact the distribution is similiar to other courses being taught. Next, we can rescale the grades, creating a mean of 87.5 and a standard deviation of 7.5. These numbers are arbitrary so use your best judgement.
1 2 3 

The second distribution may be preferred, depending on your needs. With the raw distribution, we would have had 45% of the students receiving grades below a C (assuming a normal distribution). Now, 0.9% of students would fall below the 70% cutoff. Again, my mean and standard deviation chosen in the above example are arbitrary.
]]>That said, making it work with vectors is annoying! There are logical reasons for this of course, but… it just won’t do what you want if you use ‘subset()’ as you would with a data frame or matrix.
Here’s the example:
1 2 3 

1 2 3 

After some consternation, I stumbled across this gem submitted by Marc Schwartz.
1 2 3 4 5 6 7 8 

There are other options, but this conveniently uses the same approach as the base ‘subset()’.
1 2 3 

That’s it. Consternation vanquished.
]]>Needless to say, the marriage of statistics with documents makes writing up APAstyle reports a bit easier, especially with Brian Beitzel’s amazing apa6
class for LaTeX.
However, Sweave doesn’t always work correctly. One common complaint that you’ll get after Sweaving a file is Sweave.sty not found!
. While Sweave.sty is a LaTeX package, it doesn’t live with the rest of the LaTeX packages because it’s installed using R. Many people try to solve this by copying and pasting Sweave.sty into every document directory, but I’m sharing a better way below.
Using Terminal.app:
1


1


1


If using mktexlsr
results in a command not found
error, the TeX Live distribution probably isn’t in your $PATH, but you can hunt for the program anyway. For example, if you’re using MacTeX 2013, the program will be found in a directory similar to this:
1


When you’re getting started with LaTeX, many Mac users prefer the bundled
editor, TeXShop. Cameron Bracken gives us a helpful piece of code that allows easy Sweaving straight from TeXShop. TeXShop uses various engines that allow it to render LaTeX. Using a bit of BASH
scripting, we can write our own Sweave engine and make it available right within TeXShop. I have adapted Cameron’s original engine to accomodate Bibtex citations (see here).
Using a text editor, paste in the following syntax and save the file as Sweave.engine:
1 2 3 4 5 6 7 8 

Next, in Terminal.app, move the file to the TeXShop engines folder:
1 2 

Now restart TeXShop if it’s running and you should see Sweave as an available option!
]]>By using LaTeX to author APA manuscripts, researchers can address many problems associated with formatting their results
into tables and figures. For example, ANOVA tables can be readily generated using the
xtable
package in R, and graphs from
ggplot2
can be rendered within the manuscript using
Sweave
(see Wikipedia). However, more complicated layouts can be difficult to
achieve.
In order to make test items or stimuli easier to understand, researchers occasionally organize examples in a table or
figure. Using the standard \table
command in LaTeX, it’s possible to include figures in an individual table cell
without breaking the APA6.cls package. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 

However, the above code vertically aligned my images according to their bottomedge, producing an awkward looking table.
Instead, we want the figures to be vertically centered. A Google search revealed the LaTeX
Wikibook, which suggests a few methods to force
figures to vertically align according to their center. Below, I surround each \includegraphics{}
command with the
\parbox{}
command, which centers it along 1 unit of measurement, set to 12 pts. in my apa6 class options.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 

Output:
By using \parbox
, figures are now vertically aligned with text cells. However, with the addition of figures the table
is too long and we must span the table across 2pages. To solve this, split the information across two tables. In this
case, I can split by the stimuli category.
Alternatively, Brian Beitzel also pointed out that we can invoke
longtable
as a class option in APA6.cls, which allows
tables to span multiple pages.
Are you using the psych
package? If not, download it and install it (as described in my post on importing data). Once that’s done, the describe function (in the psych package) should give you all you need, and possibly more:
1


Obviously, you will substitute “mydata” in the command above for the name you have given your own data frame. If you’re unsure of what you called it, try typing ls()
in R.
The output will look something like this:
1 2 3 4 5 6 

Now, it’s not uncommon to get an error like this:
1


This is because some of your variables (i.e., the columns) are stored as “character” strings. If you have some columns of data with text, then this may be appropriate. This sometimes also occurs with data that you expected to be “numeric” because of the way your data were originally entered. This can happen in Excel without being obvious. We have this issue occasionally with data being pulled out of SQL databases.
It’s not hard to fix. This command will help us identify the problematic variables:
1


The output will show you the class for each of the variables in the data frame. Now, you have options. If you want to change the class of a variable (presumably because they are “character” despite containing all numbers), the transform()
function is very useful. For example:
1


Note that you’ll only refer to the variables being changed when using transform()
. And there may be some cases where it’s not a good idea to go about changing classes willynilly (R will give you a message if NAs result — worry about this if/when it comes up).
If you want to leave variables as they are because the character class is appropriate, then just tell the describe()
function to ignore those columns. For example, if the 3rd variable contained character strings, you could leave that column out when running describe():
1


And finally, you may only want to get a subset of the information returned from describe()
. Since describe()
returns an object, we can use the colnames()
command to see what’s inside (i.e., the structure):
1


1 2 3 

We see that column 1 corresponds to the variable number, column 2 is the sample size (n), and so on. For example, if you only want means, standard deviations, and medians:
1


Gives something like this:
1 2 3 4 5 6 

All done.
]]>For those of you who don’t know, vlookup is one of the many powerful builtin functions in Excel. It allows one to search through a data structure for rows that match some specified characteristic. And, it is then possible to pull information (from, say, some other column(s) in that data structure) for the matching rows. I realize that doesn’t sound so magical, but if you’ve never used it before… trust me, it will change your life (if only a smidge).
Let’s just go straight to an example. Start by creating this data frame in R:
1


If you call the “students” data frame, it will be a 7x3 object showing the numbers and teams for seven people. Now we have another data frame with scores for nine teams.
1


If you call “scores”, it’ll be a 9x3 object showing two scores for each team (named after a color). Rather predictably, we now want to get the scores for students in the first data frame. Start by making columns with missing values for inserting the scores.
1 2 3 

Here’s where the work is done. Match the two data frames by their common values and declare which values you want to take out of the scores data frame to put in the students data frame. We do this for both scores below:
1 2 

Call the students data frame to check that everything worked.
You do have to be careful about the ordering of variables in the ‘match()’ function. If you get it backwards, it’ll break because the vector of data to be inserted does not fit the dimensions of the target data frame.
I admit, this doesn’t cover all the bells and whistles of vlookup but it’s good enough for most uses. And you didn’t even have to open Excel.
]]>In this tutorial, I’ll cover how to analyze repeatedmeasures designs using 1) multilevel modeling using the lme
package and 2) using Wilcox’s Robust Statistics package (see Wilcox, 2012). In a repeatedmeasures design, each participant provides data at multiple time points. Due to this, the assumptions about model error are different for variances which are presented between subjects (i.e., SS_{B} than are variables presented within subjects (i.e., SS_{W}. After the withinsubject variability is partialled out, we model separately the effect of the experiment (i.e., SS_{E} and the error not account for by the experiment (i.e., SS_{R}).
When using this tutorial, there are a few things to keep in mind:
This is a draft. I’ll be updating this page with more graphs and explanations as time allows, informed by your feedback.
Multilevel models and Robust ANOVAs are just a few of the ways that repeatedmeasures designs can be analyzed. I’ll be presenting the multilevel approach using the nlme
package because assumptions about sphericity are different and are less of a concern under this approach (see Field et al., 2012, p. 576).
The first dataset we’ll be using can be obtained from the Personality Project:
1 2 

The main research question is does the valence of the word affect the rate at which items are recalled? First, let’s take a look at descriptive statistics of the dataset. We can sort them by the item valence using the describeBy()
function in the psych
package, which is available on CRAN.
1 2 3 4 5 

1 2 3 4 5 6 7 8 9 10 11 

We can generate a quick boxplot to display the effect of Valence on Recall using the ggplot2
package from CRAN.
1 2 

A multilevel model is simply a regression that allows for the errors to be dependent on eachother (as our conditions of Valence were repeated within each participant). To run this type of analysis, we’ll use the nlme
package from CRAN, although I’ve also had good luck with the lme4
package if you like experimenting.
1


Similar to any approach to model testing, we want to see if our predictive, augmented model is better than a simple, 1 parameter mean model. Thus, we begin by specifying a baseline
model in which the DV, Recall
, is predicted by its overall mean. Second, we specify our model of interest, in which Recall
is predicted instead by a the item Valence
, which was repeated within subjects.
1 2 3 4 5 6 7 8 9 

One way of assessing the significance of our model is by comparing it from the baseline model. By comparing the models, we ask whether Valence as a predictor is significantly better than the simple mean model (i.e., a better fit). We can do this with the anova()
function.
1


1 2 3 

The output contains a few indicators of model fit. Generally with AIC (i.e., Akaike information criterion) and BIC (i.e., Bayesian information criterion), the lower the number the better the model, as it implies either a more parsimonious model, a better fit, or both. The likelihood ratio indicates that our valenceModel is a significnatly better fit for the data than our baseline model (p < 0.0001). Therefore, the item Valence had a significant impact on the measured Recall of the participant, Χ^{2}(2) = 44.87, p < 0.0001.
We can obtain more specific details about the model using the summary()
function:
1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 

1 2 3 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

Thus, our post hoc analysis shows that participants’ rate of recall was significantly better for positively valenced items (M = 40) than neutral (M = 11.6, b = 28.40, p < .0001) and negatively valenced items (M = 27.8, b = 12.20, p < .0001). Similarly, neutral items were recalled at a significantly higher rate than negatively valenced items (b = 16.20, p < .0001).
As of 5/1/13, the WRS
package must be compiled from source to be installed. You can obtain the source package from the RForge repo below:
1 2 3 

Unlike using lme()
to analyze the data as a multilevel model, rmanova()
requires that the data are in wide format. To adjust our table, we’ll use the reshape2
package from CRAN and cast the data into a wide format.
1 2 3 

For some reason, the rmanova()
function doesn’t like dealing with factors variables, so we’ll remove the 5 Subjects. Finally, we’ll use rmanova()
, which trims the data by 20% before estimating the effect.
1 2 3 

1 2 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

Similar to our findings from above, Valence had a significant influence on the item recall rate of the participant, F(1.26, 2.52) = 154.66, p < .01. However, we still want to conduct posthoc analysis on the 20% trimmed means, which we’ll do using the rmmcp()
function.
1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

Posthoc analysis confirms that Negatively valenced items are significantly different from both Neutral (Ψ̂ = 16, p < .01) and Positive items (Ψ̂ = 13, p < .05). Additionally, Neutral items are significantly different from positive items (Ψ̂ = 28.33, p < .01).
The second dataset we’ll be using can be obtained from the Personality Project:
1 2 

1 2 3 4 5 

1 2 3 4 5 6 7 8 9 10 11 

1 2 3 4 5 

1 2 3 4 5 6 7 

1


1


1 2 

We begin by setting up orthogonal contrasts for our Task and Valence factors.
By setting contrasts, we make the main effects of our ANOVA results more intepretable.
1 2 3 4 5 6 7 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

We again the significance of our models by comparing them from the baseline model. We can do this with the anova()
function.
1


1 2 3 4 5 6 

Our taskModel
, which includes the main effect of Task, is the preferred significant model (p = .011).
the Valenceonly model was not significant, nor was our interaction model, which included an interaction term for
Valence and Task, indicating that the Valence of the word had no effect on participants’ recall.
1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 

You may also want generate a more familiar ANOVAstyle table:
1


1 2 3 

Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. SAGE Publications.
Wilcox, R. R. (2012). Introduction to robust estimate and hypothesis testing.
When presenting data, confidence intervals and error bars let the audience know the amount of uncertainty in the data, and see how much of the variance is explained by the reported effect of an experiment. While this is straightforward for betweensubject variables, it’s less clear for mixed and repeatedmeasures designs.
Consider the following. When running an ANOVA, the test accounts for three sources of variance: 1) the fixed effect of the condition, 2) the ability of the participants, and 3) the random error, as data = model + error. Plotting the repeatedmeasures without taking the different sources of variance into consideration would result in overlapping error bars that include betweensubject variability, confusing the presentation’s audience. While the ANOVA partials out the differences between the participants and allow you to assess the effect of the repeatedmeasure, computing a regular confidence interval by multiplying the standard error and the Fstatistic doesn’t work in this way.
Winston Chang has developed a set of R functions based on Morey (2008) and Cousineau (2005) on his wiki that help deal with this problem, where the sample variance is computed for the normalized data, and then multiplied by the sample variances in each condition by M(M1), where M is the number of withinsubject conditions.
See his wiki here for more info.
Morey, R. D. (2008). Confidence intervals from normalized data: A correction to Cousineau (2005). Tutorial in Quantiative Methods for Psychology, 4(2), 6164.
Cousineau, D. (2005). Confidence intervals in withinsubject designs: A simpler solution to Loftus and Massonâ€™s method. Tutorial in Quantitative Methods for Psychology, 1(1), 4245.
Loftus, G. R., & Masson, M. E. (1994). Using confidence intervals in withinsubject designs. Psychonomic Bulletin & Review, 1(4), 476490.
The topic is getting your data out of Excel and into R. It turns out that loading the data is one of the most frustrating experiences for new R users — incomprehensible error messages are not uncommon. This results in a rocky start to what might otherwise be a beautiful relationship (between you and R, that is).
So, let’s get past this basic roadblock. Seven simple steps.
Do you have R loaded on your machine? If yes, great – open it. If not, bummer. You’ve got to go off and get it. If you need help, try William Revelle’s R pages on the PersonalityProject. You might also consider installing RStudio, which is an increasingly popular option (especially among Windows users). Personally, I’d recommend giving the basic “R Console” a try once or twice if you have a Mac, but plenty of RStudio devotees would disagree with me. Whatever decision you make, open it after downloading.
Do you have the “psych” package installed and loaded? This does not happen automatically when you download R. If you’re using RStudio, this is most easily done in the lower right window under the packages tab (click the “Install Packages” button and type “psych” in the search bar). If you’re using the R Console, use the dropdown menus: “Packages & Data ==> Package Installer”. Then type “psych” in the search box and hit “Get List”. Select the psych package and hit “Install Selected”. Then, in the main console window (the one with the “>” prompt and the blinking cursor), type:
1


Open the Excel file with your data.
Column headers. If you haven’t already, give your variables brief but meaningful names in the Excel file (by “variables”, I’m talking about the column names). Do not use numbers as the leading values/characters (e.g., “12blu”) for the column names as this will cause issues in R. In fact, it would be best if you avoided the use of any special characters as some of these will cause issues as well. Also, don’t use column names that spread over more than one row in Excel.
Select all of the data and the columns in your Excel spreadsheet and “copy” it.
Switch to R. Enter this command:
1


That’s it. Very simple. Everything you copied from the Excel sheet is now in an object called ‘mydata’ (this object should have a “class” type of “data.frame” — though you probably don’t need to care about that at the moment). Anyway, you oughta check to make sure everything is as it should be. Try these commands:
1 2 

Hopefully, the result from dim(mydata)
will be the same dimensions as the rows and columns you copied from Excel. The result from the headTail(mydata)
will show the first and last several rows of your data frame.
Many other importing methods exist, of course. For an exhaustive (and exhausting) review, try the manual on CRAN.
And there are plenty of little issues to watch out for (dealing with missing data or very large data frames, for example). Maybe I’ll tackle those next time…
]]>Posters are available upon request:
French, J. A., Condon, D. M., Revelle, W., & Rosengren, K. S. (2013). Predicting scientific attitudes using traits, ability, and interests. [PDF]
Condon, D. M., French, J. A., Brown, A., & Revelle, W. (2013). Development and validation of the International Cognitive Ability Resource. [PDF]
Condon, D. M. & Revelle, W. (2013). Synthetic Aperature Personality Assessment: Within and across the dimensions of personality. [PDF]
Brown, A. D. & Condon, D. M. (2013). What do we know when we know an IQ score? Ability by personality interactions predict intelligence test performance and item response styles.
Wilt, J. & Revelle, W. (2013). A new form and function for personality.
Previously, SAPA data were also presented at the annual convention for the Association for Psychological Science: