我有两个数据集,df和df2。这是两个非常大且混乱的数据帧的过度简化版本。
在最初的df中,我通过按腰带和体重分组为每个人创建了一个唯一的id。我希望每个人在df中都有相同的身份证号码,以便给df2中的同一个人。它们需要具有相同的名称,并应按皮带和重量分组。注意,df2中有一些人不在df中。
简化的df如下所示
belt weight rank id name 1 purple open class 1 55 Tom Cruise 2 black rooster 2 79 Emma Watson 3 blue feather 3 63 John Doe 4 blue feather 4 63 John Doe 5 purple open class 5 55 Tom Cruise 6 brown heavy 6 3 James Bond 7 purple open class 7 55 Tom Cruise 8 purple heavy 8 61 Tom Cruise 9 black open class 9 70 Jane Doe 10 purple heavy 10 61 Tom Cruise
第二个数据帧看起来像这样。一个在df2,但不在df的人应该收到一个NA作为他们的id。注意,id必须由腰带和体重给出,因为有些人根据他们所参加的体重组别而有不同的分数
belt2 weight2 rank2 name points 1 purple open class 1 Tom Cruise 100 2 black rooster 2 Emma Watson 30 3 blue feather 3 John Doe 50 4 blue feather 4 John Doe 50 5 purple open class 5 Tom Cruise 100 6 brown heavy 6 James Bond 200 7 black rooster 7 Jon Snow 92 8 purple heavy 8 Tom Cruise 77 9 black open class 9 Jane Doe 88 10 purple heavy 10 Tom Cruise 77
这是我希望df2的样子:
belt2 weight2 rank2 id name points 1 purple open class 1 55 Tom Cruise 100 2 black rooster 2 79 Emma Watson 30 3 blue feather 3 63 John Doe 50 4 blue feather 4 63 John Doe 50 5 purple open class 5 55 Tom Cruise 100 6 brown heavy 6 3 James Bond 200 7 black rooster 7 NA Jon Snow 92 8 purple heavy 8 61 Tom Cruise 77 9 black open class 9 70 Jane Doe 88 10 purple heavy 10 61 Tom Cruise 77
基本上,我希望df2中的id号与df中的id编号匹配。如果不匹配,请填写NA。
# df belt <- c("purple", "black", "blue", "blue", "purple", "brown", "purple", "purple", "black", "purple") weight <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "open class", "heavy", "open class", "heavy") rank <- 1:10 id <- c(55, 79, 63, 63, 55, 3, 55, 61, 70, 61) names <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Tom Cruise", "Tom Cruise", "Jane Doe", "Tom Cruise") (df <- data.frame(belt, weight, rank, id, name = names)) #df2 belt2 <- c("purple", "black", "blue", "blue", "purple", "brown", "black", "purple", "black", "purple") weight2 <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "rooster", "heavy", "open class", "heavy") rank2 <- 1:10 names2 <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Jon Snow", "Tom Cruise", "Jane Doe", "Tom Cruise") points <- c(100, 30, 50, 50, 100, 200, 92, 77, 88, 77) (df2 <- data.frame(belt2, weight2, rank2, name = names2, points))