嗨,我需要用R编程语言解决一个问题
library(gtools) end_date <- "2021-12-31" ddf1 <- data.frame(pnr=c("1","1","2","2","3","3","3","4"), in_date=as.POSIXct(c("2010-08-18","2010-09-01","2019-04-02","2018-03-27", "2019-07-12","2013-10-20","2012-07-01","2015-05-02")), out_date=as.POSIXct(c("2010-12-04",NA,"2019-05-17",NA, NA,"2017-08-19",NA,NA)), treat1=c(1,1,1,1,1,1,1,1) ) ddf2 <- data.frame(pnr=c("4","4","3","3","2","2","2","1"), in_date=as.POSIXct(c("2010-08-18","2010-09-01","2019-04-02","2018-03-27", "2019-07-12","2013-10-20","2012-07-01","2015-05-02")), out_date=as.POSIXct(c("2010-12-04",NA,"2019-05-17",NA, NA,"2017-08-19",NA,NA)), treat2=c(1,1,1,1,1,1,1,1) ) expected_output1 <- data.frame(pnr=c("1","2","3","3","4"), in_date=as.POSIXct(c("2010-08-18","2018-03-27", "2019-07-12","2012-07-01","2015-05-02")), out_date=as.POSIXct(c("2010-12-04","2019-05-17",end_date, "2017-08-19",end_date)), treat1=c(1,1,1,1,1) ) expected_output2 <- data.frame(pnr=c("4","3","2","2","1"), in_date=as.POSIXct(c("2010-08-18","2018-03-27", "2019-07-12","2012-07-01","2015-05-02")), out_date=as.POSIXct(c("2010-12-04","2019-05-17",end_date, "2017-08-19",end_date)), treat2=c(1,1,1,1,1) ) ddf <- smartbind(ddf1,ddf2) expected_output <- smartbind(expected_output1,expected_output2) > ddf pnr in_date out_date treat1 treat2 1:1 1 2010-08-18 2010-12-04 1 NA 1:2 1 2010-09-01 <NA> 1 NA 1:3 2 2019-04-02 2019-05-17 1 NA 1:4 2 2018-03-27 <NA> 1 NA 1:5 3 2019-07-12 <NA> 1 NA 1:6 3 2013-10-20 2017-08-19 1 NA 1:7 3 2012-07-01 <NA> 1 NA 1:8 4 2015-05-02 <NA> 1 NA 2:1 4 2010-08-18 2010-12-04 NA 1 2:2 4 2010-09-01 <NA> NA 1 2:3 3 2019-04-02 2019-05-17 NA 1 2:4 3 2018-03-27 <NA> NA 1 2:5 2 2019-07-12 <NA> NA 1 2:6 2 2013-10-20 2017-08-19 NA 1 2:7 2 2012-07-01 <NA> NA 1 2:8 1 2015-05-02 <NA> NA 1 > expected_output pnr in_date out_date treat1 treat2 1:1 1 2010-08-18 2010-12-04 1 NA 1:2 2 2018-03-27 2019-05-17 1 NA 1:3 3 2019-07-12 2021-12-31 1 NA 1:4 3 2012-07-01 2017-08-19 1 NA 1:5 4 2015-05-02 2021-12-31 1 NA 2:1 4 2010-08-18 2010-12-04 NA 1 2:2 3 2018-03-27 2019-05-17 NA 1 2:3 2 2019-07-12 2021-12-31 NA 1 2:4 2 2012-07-01 2017-08-19 NA 1 2:5 1 2015-05-02 2021-12-31 NA 1
我有一些人经历了不同的治疗,treat1
和treat2
。我需要处理这样一个事实,即一些治疗课程已经开始,但缺少out_date
。在缺少out_date
的情况下,该研究应该用end_date
替换:
end_date <- "2021-12-31"
然而,如果观察结果与
pnr in_date out_date treat1 treat2 1:1 1 2010-08-18 2010-12-04 1 NA 1:2 1 2010-09-01 <NA> 1 NA
如果in_date
(表示治疗的开始)在同一个人的另一次治疗期间内或“pnr
”,则正确的输出为:
pnr in_date out_date treat1 treat2 1:1 1 2010-08-18 2010-12-04 1 NA
因为2010-08-18
是最早的in_date
。然而,如果行中有更早的日期而没有out_date
,则应使用该日期,这是pnr 2
的情况
pnr in_date out_date treat1 treat2 1:3 2 2019-04-02 2019-05-17 1 NA 1:4 2 2018-03-27 <NA> 1 NA
变为:
1:2 2 2018-03-27 2019-05-17 1 NA
因此涵盖了整个治疗期。
在没有out_date
的情况下,应改为设置end_date
;以便:
pnr in_date out_date treat1 treat2 1:8 4 2015-05-02 <NA> 1 NA
变为:
1:5 4 2015-05-02 2021-12-31 1 NA
在较早的日期或相交的日期以及缺少out_date
的较晚的in_date
的特殊情况下,函数应该能够处理它,如pnr 3
1:5 3 2019-07-12 <NA> 1 NA 1:6 3 2013-10-20 2017-08-19 1 NA 1:7 3 2012-07-01 <NA> 1 NA
应变成:
pnr in_date out_date treat1 treat2 1:3 3 2019-07-12 2021-12-31 1 NA 1:4 3 2012-07-01 2017-08-19 1 NA
可选:如果这是可能的,那么如果函数可以根据不同的处理方式来不同地处理这一点,那就太好了,因此每个pnr
在每个treat1
和treat2
中的处理方式也不同,如expected_out
中所示
我试图编写一些代码来比较out_date是否为NA,以及日期之间的差异,但我仍然无法理解如何继续:
ddf$end_replaced <- as.integer(ifelse(is.na(ddf$out_date),1,0)) ddf <- data.table(ddf) ddf <- ddf[order(ddf$treat1,ddf$pnr,ddf$in_date,ddf$out_date),] ddf[, diffx := difftime(in_date, shift(in_date, fill=in_date[1L]), units="days"), by=pnr]
感谢阅读