Question
Asked 31st Aug, 2017

How to match two data frames in R based on multiple criteria instead of a key variable?

I have two dataframes that I want to match in R.
Dataframe A has 24,300 cases with the following variables: Name, Gender, Age, DateTime_Event
Dataframe B has 3,000 cases with the following variables: ID, Gender, Age, Date_Time_Start, Date_Time_End
Note that cases from Dataframe B can be matched to more than one case in Dataframe A.
I want to match the two dataframes by copying the relative ID from Dataframe B to the corresponding row in Dataframe A based on age, gender, and date/time. Specifically, the matching criteria include all the following:
* The Gender and Age variables should exactly match in both dataframes.
* The DateTime_Event from Dataframe A should be between the Date_Time_Start and the Date_Time_End in Dataframe B.
Thank you!

Most recent answer

7th Sep, 2017
Timothy Ingallinera
Acerta Pharma BV
left_join is a great way to do this
data_frame_a %>%
left_join(data_frame_b, by = c("Age", "Gender") 
Logical operators work with dates as long as they are identified as dates in the tibble or df so you can then run a filter through the df such as
data_frame_a %>%
left_join(data_frame_b, by = c("Age", "Gender")  %>%
filter(DateTime_Event < Date_time_end & DateTime_Event > Date_time_begin) 
should work in my experience with similar data, but there are screwy things with dates.
If they are not recognized as dates, you should look into lubridate and maybe clean them up.

Popular Answers (1)

31st Aug, 2017
Robin Faillettaz
Institut Français de Recherche pour l'Exploitation de la Mer
Hi, 
Just use the function left_join from the package dplyr. If you have the exact same column names for shared variables in the your two dataframes, just write: 
library("dplyr")
?dplyr::left_join
left_join(long_dataframe_name, short_dataframe_name)
You can also specify the "by" argument, to force the join using a single or few variables  
left_join(long_dataframe_name, short_dataframe_name, by= c("Age", "Gender"))
Best,
6 Recommendations

All Answers (4)

31st Aug, 2017
Robin Faillettaz
Institut Français de Recherche pour l'Exploitation de la Mer
Hi, 
Just use the function left_join from the package dplyr. If you have the exact same column names for shared variables in the your two dataframes, just write: 
library("dplyr")
?dplyr::left_join
left_join(long_dataframe_name, short_dataframe_name)
You can also specify the "by" argument, to force the join using a single or few variables  
left_join(long_dataframe_name, short_dataframe_name, by= c("Age", "Gender"))
Best,
6 Recommendations

Similar questions and discussions

RNA-Seq analysis cummerbund package in R?
Question
4 answers
  • Snijesh V.P.Snijesh V.P.
When I am using cummerbund package, it shows output like this:
CuffSet instance with:
0 samples
0 genes
0 isoforms
0 TSS
0 CDS
0 promoters
0 splicing
0 relCDS

Related Publications

Research
This research amid at designing a digital unit in Electronic Secretary, and measure its effectiveness in developing Data base preparation skills using MS Access among commercial technical education learners. To achieve this objective, the researchers used a mixed approach of descriptive and Qauzi-experimental research procedures. To identify data b...
Article
The data base for the Army Selection and Classification Project (Project A) contains two major samples referred to as the concurrent validation sample and the longitudinal validation sample. The former was drawn from a cohort that joined the Army in 1983/84, and the latter from a cohort that entered in 1986/87. This paper describes the data base re...
Article
This is one of three volumes describing the analytical aspects of the Quick-Reacting General War Gaming System (QUICK). This volume addresses the system data requirements, the organization and structure of the data base, and the concepts and techniques employed to prepare a game data base for subsequent Plan Generation and Simulation. In addition,...
Got a technical question?
Get high-quality answers from experts.