How to Combine and Merge Data Sets in R - dummies

How to Combine and Merge Data Sets in R

By Andrie de Vries, Joris Meys

You may want to combine data from different sources in your analysis. Generally speaking, you can use R to combine different sets of data in three ways:

  • By adding columns: If the two sets of data have an equal set of rows, and the order of the rows is identical, then adding columns makes sense. Your options for doing this are data.frame or cbind().

  • By adding rows: If both sets of data have the same columns and you want to add rows to the bottom, use rbind().

  • By combining data with different shapes: The merge() function combines data based on common columns, as well as common rows. In databases language, this is usually called joining data.

You use merge() to find the intersection, as well as the union, of different data sets. You also look at other ways of working with lookup tables, using the functions match() and %in%.


Sometimes you want to combine data where it isn’t as straightforward to simply add columns or rows. It could be that you want to combine data based on the values of preexisting keys in the data. This is where the merge() function is useful. You can use merge() to combine data only when certain matching conditions are satisfied.

Say, for example, you have information about states in a country. If one dataset contains information about population and another contains information about regions, and both have information about the state name, you can use merge() to combine your results.