在绘制Kaplan-Meier图时,ggplot2在生存数据存储器中找到空行

时间:2021-08-31 11:26:51

I am trying to draw some Kaplan-Meier curves using ggplot2 and code found at: https://github.com/kmiddleton/rexamples/blob/master/qplot_survival.R

我正在尝试使用ggplot2绘制一些Kaplan-Meier曲线,代码如下:https://github.com/kmiddleton/rexamples/blob/master/qplot_survival.R

I had good results with this great code in a different database. However, in this case it gives me the following error... as if I had empty rows in my dataframe:

我在一个不同的数据库中使用了这个伟大的代码,结果很好。然而,在这种情况下,它给了我以下错误…好像我的dataframe中有空行:

Error en if (nrow(layer_data) == 0) return() : argument is of length zero.

如果(nrow(layer_data) = 0)返回():参数的长度为0。

Previous questions about this type of error don't seem to be useful for me, as types of data and functions are not the same in my case.

以前关于这种类型的错误的问题似乎对我没有用处,因为在我的例子中数据和函数的类型不同。

I am rather new to the statistical analysis using R and I don't have programming background, so I think this must be a 'dumb bug' in my data, but I can't found where it is… It definitely seems that ggplot2 can't find rows to plot. Please, could you help me in any way, with clues, suggestions.. etc?

我对使用R的统计分析比较陌生,而且我没有编程背景,所以我认为这一定是我的数据中的一个“愚蠢的bug”,但是我找不到它的位置,这显然是ggplot2找不到行。3 .请您用任何方式给我一些提示和建议。等等?

Here are my data and the code used, sequentially, ready for the console -I tried it in a knitr script-. At the end, I've posted my sessionInfo:

这是我的数据和使用的代码,按顺序,为控制台做好了准备——我在knitr脚本中尝试过了——。最后,我发布了我的sessionInfo:

library(splines)
library(survival)
library(abind)
library(ggplot2)
library(grid)

I create a data frame called acbi30 (real data):

我创建了一个名为acbi30 (real data)的数据框架:

mort28day <- c(1,0,1,0,0,0,0,1,0,0,0,1,1,0,1,0,0,1,0,1,1,1,1,0,1,1,1,0,0,1)
daysurv <- c(4,29,24,29,29,29,29,19,29,29,29,3,9,29,15,29,29,11,29,5,13,20,22,29,16,21,9,29,29,15)
levo <- c(0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0)
acbi30 <- data.frame(mort28day, daysurv, levo)
save(acbi30, file="acbi30.rda")
acbi30

Then, I paste the following commands to create a function using ggplot2:

然后粘贴以下命令,使用ggplot2创建一个函数:

t.Surv <- Surv(acbi30$daysurv, acbi30$mort28day)
t.survfit <- survfit(t.Surv~1, data=acbi30)


#define custom function to create a survival data.frame#
createSurvivalFrame <- function(f.survfit){

#initialise frame variable#
f.frame <- NULL

#check if more then one strata#
if(length(names(f.survfit$strata)) == 0){

#create data.frame with data from survfit#
f.frame <- data.frame(time=f.survfit$time, n.risk=f.survfit$n.risk, n.event=f.survfit$n.event, n.censor = f.survfit
$n.censor, surv=f.survfit$surv, upper=f.survfit$upper, lower=f.survfit$lower)
#create first two rows (start at 1)#
f.start <- data.frame(time=c(0, f.frame$time[1]), n.risk=c(f.survfit$n, f.survfit$n), n.event=c(0,0),
n.censor=c(0,0), surv=c(1,1), upper=c(1,1), lower=c(1,1))
#add first row to dataset#
f.frame <- rbind(f.start, f.frame)
#remove temporary data#
rm(f.start)
}
else {
#create vector for strata identification#
f.strata <- NULL
for(f.i in 1:length(f.survfit$strata)){
#add vector for one strata according to number of rows of strata#
f.strata <- c(f.strata, rep(names(f.survfit$strata)[f.i], f.survfit$strata[f.i]))
}
#create data.frame with data from survfit (create column for strata)#
f.frame <- data.frame(time=f.survfit$time, n.risk=f.survfit$n.risk, n.event=f.survfit$n.event, n.censor = f.survfit
$n.censor, surv=f.survfit$surv, upper=f.survfit$upper, lower=f.survfit$lower, strata=factor(f.strata))
#remove temporary data#
rm(f.strata)
#create first two rows (start at 1) for each strata#
for(f.i in 1:length(f.survfit$strata)){
#take only subset for this strata from data#
f.subset <- subset(f.frame, strata==names(f.survfit$strata)[f.i])
#create first two rows (time: 0, time of first event)#
f.start <- data.frame(time=c(0, f.subset$time[1]), n.risk=rep(f.survfit[f.i]$n, 2), n.event=c(0,0),
n.censor=c(0,0), surv=c(1,1), upper=c(1,1), lower=c(1,1), strata=rep(names(f.survfit$strata)[f.i],
2))
#add first two rows to dataset#
f.frame <- rbind(f.start, f.frame)
#remove temporary data#
rm(f.start, f.subset)
}
#reorder data#
f.frame <- f.frame[order(f.frame$strata, f.frame$time), ]
#rename row.names#
rownames(f.frame) <- NULL
}
#return frame#
return(f.frame)
}


#define custom function to draw kaplan-meier curve with ggplot#
qplot_survival <- function(f.frame, f.CI="default", f.shape=3){
#use different plotting commands dependig whether or not strata's are given#
if("strata" %in% names(f.frame) == FALSE){
#confidence intervals are drawn if not specified otherwise#
if(f.CI=="default" | f.CI==TRUE ){
#create plot with 4 layers (first 3 layers only events, last layer only censored)#
#hint: censoring data for multiple censoring events at timepoint are overplotted#
#(unlike in plot.survfit in survival package)#
ggplot(data=f.frame) + geom_step(aes(x=time, y=surv), direction="hv") + geom_step(aes(x=time,
y=upper), directions="hv", linetype=2) + geom_step(aes(x=time,y=lower), direction="hv", linetype=2) +
geom_point(data=subset(f.frame, n.censor==1), aes(x=time, y=surv), shape=f.shape)
}
else {
#create plot without confidence intervals#
ggplot(data=f.frame) + geom_step(aes(x=time, y=surv), direction="hv") +
geom_point(data=subset(f.frame, n.censor==1), aes(x=time, y=surv), shape=f.shape)
}
}
else {
if(f.CI=="default" | f.CI==FALSE){
#without CI#
ggplot(data=f.frame, aes(group=strata, colour=strata)) + geom_step(aes(x=time, y=surv),
direction="hv") + geom_point(data=subset(f.frame, n.censor==1), aes(x=time, y=surv), shape=f.shape)
}
else {
#with CI (hint: use alpha for CI)#
ggplot(data=f.frame, aes(colour=strata, group=strata)) + geom_step(aes(x=time, y=surv),
direction="hv") + geom_step(aes(x=time, y=upper), directions="hv", linetype=2, alpha=0.5) +
geom_step(aes(x=time,y=lower), direction="hv", linetype=2, alpha=0.5) +
geom_point(data=subset(f.frame, n.censor==1), aes(x=time, y=surv), shape=f.shape)
}
}
}

Plotting global survival curve (with 95% CI):

绘制全球生存曲线(95% CI):

It doesn't give any errors:

它不会出现任何错误:

# Kaplan-Meier plot, global survival (with CI)
t.survfit <- survfit(t.Surv~1, data=acbi30)
t.survframe <- createSurvivalFrame(t.survfit)
t.survfit
qplot_survival(t.survframe, TRUE, 20)

Plotting stratified survival curves:

策划分层生存曲线:

Gives the error above mentioned:

给出上述错误:

# Kaplan-Meier plot, stratified survival
t.survfit2 <- survfit(t.Surv~levo, data=acbi30)
t.survframe2 <- createSurvivalFrame(t.survfit2)
t.survfit2
qplot_survival(t.survframe2, TRUE, 20)

Plotting the results without ggplot2:

在没有ggplot2的情况下绘制结果:

The structure of t.survframe2 seems OK to me, without any empty rows, so it must be a problem of qplot_survival reading my data in t.survframe2. Making a simple plot doesn't return any error:

t的结构。在我看来,survframe2没有任何空行,所以在t.survframe2中读取数据肯定是qplot_survival的问题。绘制一个简单的图不会返回任何错误:

t.survframe2
plot(t.survfit2)

Where is the problem with my dataframe? The functions created work well with other datasets, but not with this one...

我的dataframe有什么问题吗?创建的函数可以很好地与其他数据集一起工作,但与此不同……

Thank you in advance,

谢谢你提前,

Mareviv

Mareviv

Session info:

会议信息:

sessionInfo()

R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit)

R版本2.15.2(2012-10-26)平台:i386-w64-mingw32/i386(32位)

locale:

[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] grid      splines   stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] ggplot2_0.9.3    abind_1.4-0      survival_2.36-14 knitr_0.8       

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2      
 [4] evaluate_0.4.2     formatR_0.7        gtable_0.1.2      
 [7] labeling_0.1       MASS_7.3-22        munsell_0.4       
[10] plyr_1.8           proto_0.3-9.2      RColorBrewer_1.0-5
[13] reshape2_1.2.1     scales_0.2.3       stringr_0.6.1     
[16] tools_2.15.2    

2 个解决方案

#1


2  

I did a little cosmetic surgery on your qplot_survival() function. The main problem seemed to be your subset condition in the data = argument of geom_point; in both t.survframe and t.survframe2, a table of n.censor yielded values 0, 3 and 12. By changing the subset condition to n.censor > 0, I managed to get a plot in all cases. I also didn't see the point of f.CI = "default", so I set the default to TRUE and modified the if conditions accordingly.

我对qplot_survival()函数做了一些整容手术。主要的问题似乎是你的数据的子集的条件。在这两个t。survframe和t。survframe2,一个n的表格。审查的值为0、3和12。通过将子集条件变为n。审查>,我设法得到了所有情况下的阴谋。我也没看到f的点。CI = "default",因此我将默认设置为TRUE并相应地修改if条件。

qplot_survival <- function(f.frame, f.CI= TRUE, f.shape=3)
{
 # use different plotting commands depending whether 
 # or not strata are given#
if(!("strata" %in% names(f.frame)))
{
  #confidence intervals are drawn if not specified otherwise#
   if( isTRUE(f.CI) )
   {
      # create plot with 4 layers (first 3 layers only events, 
      # last layer only censored)#
      # hint: censoring data for multiple censoring events at 
      # timepoint are overplotted#
      # (unlike in plot.survfit in survival package)#
   ggplot(data=f.frame) + 
      geom_step(aes(x=time, y=surv), direction="hv") + 
      geom_step(aes(x=time, y=upper), direction ="hv", linetype=2) + 
      geom_step(aes(x=time,y=lower), direction="hv", linetype=2) +
      geom_point(data=subset(f.frame, n.censor > 0), 
                 aes(x=time, y=surv), shape=f.shape)
   } else {
  #create plot without confidence intervals#
   ggplot(data=f.frame) + 
      geom_step(aes(x=time, y=surv), direction="hv") +
      geom_point(data=subset(f.frame, n.censor > 0), 
                 aes(x=time, y=surv), shape=f.shape)
          }
} else {
  if( !(isTRUE(f.CI)) ){
#without CI#
  ggplot(data=f.frame, aes(group=strata, colour=strata)) + 
     geom_step(aes(x=time, y=surv), direction="hv") + 
     geom_point(data=subset(f.frame, n.censor > 0), 
                aes(x=time, y=surv), shape=f.shape)
} else {

#with CI (hint: use alpha for CI)#
  ggplot(data=f.frame, aes(x = time, colour=strata, group=strata)) + 
      geom_step(aes(y=surv), direction="hv") + 
      geom_step(aes(y=upper), direction="hv", 
                   linetype=2, alpha=0.5) +
      geom_step(aes(y=lower), direction="hv", 
                   linetype=2, alpha=0.5) +
      geom_point(data=subset(f.frame, n.censor > 0), 
                 aes(y=surv), shape=f.shape)
      }
   }
}

The following plots all worked for me after making these changes:

在做了这些改变之后,下面这些情节都对我起了作用:

qplot_survival(t.survframe2, TRUE, 20)
qplot_survival(t.survframe2, FALSE, 20)
qplot_survival(t.survframe, TRUE, 20)
qplot_survival(t.survframe, FALSE, 20)

A couple of comments:

一些评论:

  1. Subsetting inside a function can be dangerous because sometimes, as in this case, satisfying the condition returns a zero-row data frame. I'd consider whether or not the geom_point() layer is really necessary.
  2. 在函数中进行子设置可能是危险的,因为在这种情况下,满足条件会返回零行数据帧。我将考虑是否需要使用geom_point()层。
  3. In a couple of places, you had directions = "hv" inside a geom_step() call. The argument is not pluralized and has been changed above.
  4. 在一些地方,在geom_step()调用中有方向=“hv”。参数不是复数的,上面已经修改过了。
  5. This could be done a little more efficiently I think, but one way to extract the columns of interest from a survfit object, say t.survfit, is something like this:
  6. 我认为这可以更有效地完成,但是从survfit对象(比如t)中提取感兴趣的列的一种方法。survfit,是这样的:

(Expand comps when strata are present)

(在地层存在时展开comps)

comps <- c(2:6, 8, 10);
t.fit <- as.data.frame(do.call(cbind, lapply(comps, function(j) t.survfit[[j]])))
names(t.fit) <- names(t.survfit)[comps]

#2


1  

Here is another version that also accounts for the case when there are no censoring points in your data (@Dennis's version still fails in that case). This could be made more efficient, probably by creating a variable that stores how many censoring points there are in the entire dataframe upfront, and re-use that, rather than testing like I do again in each case.

这是另一个版本,当你的数据没有审查点时(@Dennis的版本在这种情况下仍然失败)。这可以提高效率,可能是通过创建一个变量来存储整个dataframe中有多少个审查点,然后重新使用它,而不是像我在每种情况下所做的那样进行测试。

# define custom function to draw kaplan-meier curve with ggplot
qplot_survival <- function(f.frame, f.CI="default", f.shape=3){
  # use different plotting commands dependig whether or not strata's are given
  if("strata" %in% colnames(f.frame) == FALSE){
    # confidence intervals are drawn if not specified otherwise
    if(f.CI=="default" | f.CI==TRUE ){
      # create plot with 4 layers (first 3 layers only events, last layer only censored)
      # hint: censoring data for multiple censoring events at timepoint are overplotted



      # (unlike in plot.survfit in survival package)
      p <- ggplot(data=f.frame) + geom_step(aes(x=time, y=surv), direction="hv") + geom_step(aes(x=time, 
                                                                                            y=upper), directions="hv", linetype=2) + geom_step(aes(x=time,y=lower), direction="hv", linetype=2)
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p+geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
    else {
      # create plot without confidence intervalls
      p <- ggplot(data=f.frame) + geom_step(aes(x=time, y=surv), direction="hv")
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p + geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
  }
  else {
    if(f.CI=="default" | f.CI==FALSE){
      # without CI 
      p <- ggplot(data=f.frame, aes(group=strata, colour=strata)) + geom_step(aes(x=time, y=surv), 
                                                                         direction="hv") 
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p +geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
    else {
      # with CI (hint: use alpha for CI)
      p <- ggplot(data=f.frame, aes(colour=strata, group=strata)) + geom_step(aes(x=time, y=surv), 
                                                                         direction="hv") + geom_step(aes(x=time, y=upper), directions="hv", linetype=2, alpha=0.5) + 
        geom_step(aes(x=time,y=lower), direction="hv", linetype=2, alpha=0.5)
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p + geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
  }
}

#1


2  

I did a little cosmetic surgery on your qplot_survival() function. The main problem seemed to be your subset condition in the data = argument of geom_point; in both t.survframe and t.survframe2, a table of n.censor yielded values 0, 3 and 12. By changing the subset condition to n.censor > 0, I managed to get a plot in all cases. I also didn't see the point of f.CI = "default", so I set the default to TRUE and modified the if conditions accordingly.

我对qplot_survival()函数做了一些整容手术。主要的问题似乎是你的数据的子集的条件。在这两个t。survframe和t。survframe2,一个n的表格。审查的值为0、3和12。通过将子集条件变为n。审查>,我设法得到了所有情况下的阴谋。我也没看到f的点。CI = "default",因此我将默认设置为TRUE并相应地修改if条件。

qplot_survival <- function(f.frame, f.CI= TRUE, f.shape=3)
{
 # use different plotting commands depending whether 
 # or not strata are given#
if(!("strata" %in% names(f.frame)))
{
  #confidence intervals are drawn if not specified otherwise#
   if( isTRUE(f.CI) )
   {
      # create plot with 4 layers (first 3 layers only events, 
      # last layer only censored)#
      # hint: censoring data for multiple censoring events at 
      # timepoint are overplotted#
      # (unlike in plot.survfit in survival package)#
   ggplot(data=f.frame) + 
      geom_step(aes(x=time, y=surv), direction="hv") + 
      geom_step(aes(x=time, y=upper), direction ="hv", linetype=2) + 
      geom_step(aes(x=time,y=lower), direction="hv", linetype=2) +
      geom_point(data=subset(f.frame, n.censor > 0), 
                 aes(x=time, y=surv), shape=f.shape)
   } else {
  #create plot without confidence intervals#
   ggplot(data=f.frame) + 
      geom_step(aes(x=time, y=surv), direction="hv") +
      geom_point(data=subset(f.frame, n.censor > 0), 
                 aes(x=time, y=surv), shape=f.shape)
          }
} else {
  if( !(isTRUE(f.CI)) ){
#without CI#
  ggplot(data=f.frame, aes(group=strata, colour=strata)) + 
     geom_step(aes(x=time, y=surv), direction="hv") + 
     geom_point(data=subset(f.frame, n.censor > 0), 
                aes(x=time, y=surv), shape=f.shape)
} else {

#with CI (hint: use alpha for CI)#
  ggplot(data=f.frame, aes(x = time, colour=strata, group=strata)) + 
      geom_step(aes(y=surv), direction="hv") + 
      geom_step(aes(y=upper), direction="hv", 
                   linetype=2, alpha=0.5) +
      geom_step(aes(y=lower), direction="hv", 
                   linetype=2, alpha=0.5) +
      geom_point(data=subset(f.frame, n.censor > 0), 
                 aes(y=surv), shape=f.shape)
      }
   }
}

The following plots all worked for me after making these changes:

在做了这些改变之后,下面这些情节都对我起了作用:

qplot_survival(t.survframe2, TRUE, 20)
qplot_survival(t.survframe2, FALSE, 20)
qplot_survival(t.survframe, TRUE, 20)
qplot_survival(t.survframe, FALSE, 20)

A couple of comments:

一些评论:

  1. Subsetting inside a function can be dangerous because sometimes, as in this case, satisfying the condition returns a zero-row data frame. I'd consider whether or not the geom_point() layer is really necessary.
  2. 在函数中进行子设置可能是危险的,因为在这种情况下,满足条件会返回零行数据帧。我将考虑是否需要使用geom_point()层。
  3. In a couple of places, you had directions = "hv" inside a geom_step() call. The argument is not pluralized and has been changed above.
  4. 在一些地方,在geom_step()调用中有方向=“hv”。参数不是复数的,上面已经修改过了。
  5. This could be done a little more efficiently I think, but one way to extract the columns of interest from a survfit object, say t.survfit, is something like this:
  6. 我认为这可以更有效地完成,但是从survfit对象(比如t)中提取感兴趣的列的一种方法。survfit,是这样的:

(Expand comps when strata are present)

(在地层存在时展开comps)

comps <- c(2:6, 8, 10);
t.fit <- as.data.frame(do.call(cbind, lapply(comps, function(j) t.survfit[[j]])))
names(t.fit) <- names(t.survfit)[comps]

#2


1  

Here is another version that also accounts for the case when there are no censoring points in your data (@Dennis's version still fails in that case). This could be made more efficient, probably by creating a variable that stores how many censoring points there are in the entire dataframe upfront, and re-use that, rather than testing like I do again in each case.

这是另一个版本,当你的数据没有审查点时(@Dennis的版本在这种情况下仍然失败)。这可以提高效率,可能是通过创建一个变量来存储整个dataframe中有多少个审查点,然后重新使用它,而不是像我在每种情况下所做的那样进行测试。

# define custom function to draw kaplan-meier curve with ggplot
qplot_survival <- function(f.frame, f.CI="default", f.shape=3){
  # use different plotting commands dependig whether or not strata's are given
  if("strata" %in% colnames(f.frame) == FALSE){
    # confidence intervals are drawn if not specified otherwise
    if(f.CI=="default" | f.CI==TRUE ){
      # create plot with 4 layers (first 3 layers only events, last layer only censored)
      # hint: censoring data for multiple censoring events at timepoint are overplotted



      # (unlike in plot.survfit in survival package)
      p <- ggplot(data=f.frame) + geom_step(aes(x=time, y=surv), direction="hv") + geom_step(aes(x=time, 
                                                                                            y=upper), directions="hv", linetype=2) + geom_step(aes(x=time,y=lower), direction="hv", linetype=2)
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p+geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
    else {
      # create plot without confidence intervalls
      p <- ggplot(data=f.frame) + geom_step(aes(x=time, y=surv), direction="hv")
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p + geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
  }
  else {
    if(f.CI=="default" | f.CI==FALSE){
      # without CI 
      p <- ggplot(data=f.frame, aes(group=strata, colour=strata)) + geom_step(aes(x=time, y=surv), 
                                                                         direction="hv") 
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p +geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
    else {
      # with CI (hint: use alpha for CI)
      p <- ggplot(data=f.frame, aes(colour=strata, group=strata)) + geom_step(aes(x=time, y=surv), 
                                                                         direction="hv") + geom_step(aes(x=time, y=upper), directions="hv", linetype=2, alpha=0.5) + 
        geom_step(aes(x=time,y=lower), direction="hv", linetype=2, alpha=0.5)
      if(nrow(subset(f.frame, n.censor > 0)) > 0){
        p + geom_point(data=subset(f.frame, n.censor > 0), aes(x=time, y=surv), shape=f.shape)
      }else{
        p
      }
    }
  }
}