Since I’m just getting started using a new blogging framework here at SAPA, I figured I’d tackle a topic that is relevant for new R users. So we’ll both be doing something very basic.
The topic is getting your data out of Excel and into R. It turns out that loading the data is one of the most frustrating experiences for new R users — incomprehensible error messages are not uncommon. This results in a rocky start to what might otherwise be a beautiful relationship (between you and R, that is).
So, let’s get past this basic roadblock. Seven simple steps.
Do you have R loaded on your machine? If yes, great – open it. If not, bummer. You’ve got to go off and get it. If you need help, try William Revelle’s R pages on the Personality-Project. You might also consider installing RStudio, which is an increasingly popular option (especially among Windows users). Personally, I’d recommend giving the basic “R Console” a try once or twice if you have a Mac, but plenty of RStudio devotees would disagree with me. Whatever decision you make, open it after downloading.
Do you have the “psych” package installed and loaded? This does not happen automatically when you download R. If you’re using RStudio, this is most easily done in the lower right window under the packages tab (click the “Install Packages” button and type “psych” in the search bar). If you’re using the R Console, use the dropdown menus: “Packages & Data ==> Package Installer”. Then type “psych” in the search box and hit “Get List”. Select the psych package and hit “Install Selected”. Then, in the main console window (the one with the “>” prompt and the blinking cursor), type:
Open the Excel file with your data.
Column headers. If you haven’t already, give your variables brief but meaningful names in the Excel file (by “variables”, I’m talking about the column names). Do not use numbers as the leading values/characters (e.g., “12blu”) for the column names as this will cause issues in R. In fact, it would be best if you avoided the use of any special characters as some of these will cause issues as well. Also, don’t use column names that spread over more than one row in Excel.
Select all of the data and the columns in your Excel spreadsheet and “copy” it.
Switch to R. Enter this command:
- Hit enter.
That’s it. Very simple. Everything you copied from the Excel sheet is now in an object called ‘mydata’ (this object should have a “class” type of “data.frame” — though you probably don’t need to care about that at the moment). Anyway, you oughta check to make sure everything is as it should be. Try these commands:
Hopefully, the result from
dim(mydata) will be the same dimensions as the rows and columns you copied from Excel. The result from the
headTail(mydata) will show the first and last several rows of your data frame.
Many other importing methods exist, of course. For an exhaustive (and exhausting) review, try the manual on CRAN.
And there are plenty of little issues to watch out for (dealing with missing data or very large data frames, for example). Maybe I’ll tackle those next time…