SAPA Project Blog

A weekly review of a randomly chosen article.

How Can I Do Vlookup in R?

Sometimes, a task needs gettin’ done and I’d rather just do it than invest the time to figure out how to do it in R. This has been the case for me with the “vlookup” function for the last couple of years. Well, today is the day that we’re gonna transfer responsibility of this chore to R.

For those of you who don’t know, vlookup is one of the many powerful built-in functions in Excel. It allows one to search through a data structure for rows that match some specified characteristic. And, it is then possible to pull information (from, say, some other column(s) in that data structure) for the matching rows. I realize that doesn’t sound so magical, but if you’ve never used it before… trust me, it will change your life (if only a smidge).

Let’s just go straight to an example. Start by creating this data frame in R:

1
students <- data.frame(people=c("Lily", "Bo", "Jen", "Omar", "Sara", "Jack", "Ting"), team=c("Red", "Blue", "Green", "Red", "Blue", "Green", "Red"), number=c(1,2,3,5,2,7,1))

If you call the “students” data frame, it will be a 7x3 object showing the numbers and teams for seven people. Now we have another data frame with scores for nine teams.

1
scores <- data.frame(team=c("Black", "Blue", "Green", "Indigo", "Orange", "Red", "Violet", "White", "Yellow"), score1=c(90,96,93,88,82,84,95,89,79), score2=c(5,5,4,4,5,3,5,5,3))

If you call “scores”, it’ll be a 9x3 object showing two scores for each team (named after a color). Rather predictably, we now want to get the scores for students in the first data frame. Start by making columns with missing values for inserting the scores.

1
2
3
score1 <- rep(NA, 7)
score2 <- rep(NA, 7)
students <- data.frame(students, score1, score2)

Here’s where the work is done. Match the two data frames by their common values and declare which values you want to take out of the scores data frame to put in the students data frame. We do this for both scores below:

1
2
students$score1 <- scores$score1[match(students$team, scores$team)]
students$score2 <- scores$score2[match(students$team, scores$team)]

Call the students data frame to check that everything worked.

You do have to be careful about the ordering of variables in the ‘match()’ function. If you get it backwards, it’ll break because the vector of data to be inserted does not fit the dimensions of the target data frame.

I admit, this doesn’t cover all the bells and whistles of vlookup but it’s good enough for most uses. And you didn’t even have to open Excel.