So, you’ve figured out how to get your data in R and you want to see some basic descriptive statistics. No sweat.
Are you using the
psych package? If not, download it and install it (as described in my post on importing data). Once that’s done, the describe function (in the psych package) should give you all you need, and possibly more:
Obviously, you will substitute “mydata” in the command above for the name you have given your own data frame. If you’re unsure of what you called it, try typing
ls() in R.
The output will look something like this:
1 2 3 4 5 6
Now, it’s not uncommon to get an error like this:
This is because some of your variables (i.e., the columns) are stored as “character” strings. If you have some columns of data with text, then this may be appropriate. This sometimes also occurs with data that you expected to be “numeric” because of the way your data were originally entered. This can happen in Excel without being obvious. We have this issue occasionally with data being pulled out of SQL databases.
It’s not hard to fix. This command will help us identify the problematic variables:
The output will show you the class for each of the variables in the data frame. Now, you have options. If you want to change the class of a variable (presumably because they are “character” despite containing all numbers), the
transform() function is very useful. For example:
Note that you’ll only refer to the variables being changed when using
transform(). And there may be some cases where it’s not a good idea to go about changing classes willy-nilly (R will give you a message if NAs result — worry about this if/when it comes up).
If you want to leave variables as they are because the character class is appropriate, then just tell the
describe() function to ignore those columns. For example, if the 3rd variable contained character strings, you could leave that column out when running describe():
And finally, you may only want to get a subset of the information returned from
describe() returns an object, we can use the
colnames() command to see what’s inside (i.e., the structure):
1 2 3
We see that column 1 corresponds to the variable number, column 2 is the sample size (n), and so on. For example, if you only want means, standard deviations, and medians:
Gives something like this:
1 2 3 4 5 6