How to Work with Lookup Tables in R
Sometimes doing a full merge of the data in R isn’t exactly what you want. In these cases, it may be more appropriate to match values in a lookup table. To do this, you can use the match() or %in% function.
How to find a match
The match() function returns the matching positions of two vectors or, more specifically, the positions of first matches of one vector in the second vector. For example, to find which large states also occur in the data frame cold.states, you can do the following:
> index <- match(cold.states$Name, large.states$Name) > index  1 4 NA NA 5 6 NA NA NA NA NA
As you see, the result is a vector that indicates matches were found at positions one, four, five, and six. You can use this result as an index to find all the large states that are also cold states.
Keep in mind that you need to remove the NA values first, using na.omit():
> large.states[na.omit(index), ] Name Area 2 Alaska 566432 6 Colorado 103766 26 Montana 145587 28 Nevada 109889
How to make sense of %in%
A very convenient alternative to match() is the function %in%, which returns a logical vector indicating whether there is a match.
The %in% function is a special type of function called a binary operator. This means you use it by placing it between two vectors, unlike most other functions where the arguments are in parentheses:
> index <- cold.states$Name %in% large.states$Name > index  TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
If you compare this to the result of match(), you see that you have a TRUE value for every non-missing value in the result of match(). Or, to put it in R code, the operator %in% does the same as the following code:
> !is.na(match(cold.states$Name,large.states$Name))  TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
The match() function returns the indices of the matches in the second argument for the values in the first argument. On the other hand, %in% returns TRUE for every value in the first argument that matches a value in the second argument. The order of the arguments is important here.
Because %in% returns a logical vector, you can use it directly to index values in a vector.
> cold.states[index, ] Name Frost 2 Alaska 152 6 Colorado 166 26 Montana 155 28 Nevada 188
As mentioned earlier, the %in% function is an example of a binary operator in R. This means that the function is used by putting it between two values, as you would for other operators, such as + (plus) and – (minus). At the same time, %in% is in infix operator. An infix operator in R is identifiable by the percent signs around the function name.
If you want to know how %in% is defined, look at the details section of its Help page. But note that you have to place quotation marks around the function name to get the Help page, like this: ?”%in%”.