获得每个实例的Kaplan-Meier生存风险

时间:2023-02-03 19:26:55

Problem

Given n-patient records with time and status variables (among others), I would like to obtain their survival risk in the time period they're within ie 2, 4, 6, 8, 10 years.

鉴于具有时间和状态变量的n-患者记录(以及其他),我希望在他们在2,4,6,8,10年内的时间段内获得他们的生存风险。

I have a division of 24 - 47 months (2 years), 48 - 83 months (4 years), 84 - 107 months (6 years), 108 - 119 months (8 years) and 120 - "up to what's available" months (10 years).

我有24-47个月(2年),48-83个月(4年),84-107个月(6年),108-119个月(8年)和120-“达到可用”月份的分工(10年)。

In an individual perspective, a patient with survival months of 30 months will be included in the two-year period and along with the other predictive variables I want to know this patient's survival risk within two years.

从个体角度来看,生存月份为30个月的患者将被纳入两年期,并与其他预测变量一起,我希望在两年内了解该患者的生存风险。

My method

I'm retrieving survival risk percentages for my data using the R code described in this thread.

我正在使用此主题中描述的R代码检索我的数据的生存风险百分比。

km <- survfit(Surv(time, status)~1, data=mydata)
survest <- stepfun(km$time, c(1, km$surv))

The time variable is the survival months and the status has values 1 and 0 for alive and dead respectively.

时间变量是生存月,状态的值分别为活和死的值1和0。

The code outputs something like this (taken from here):

代码输出这样的东西(取自这里):

> survest(0:100)
 [1] 1.0000000 0.9854015 0.9781022 0.9708029 0.9635036 0.9635036 0.9635036
 [8] 0.9416058 0.9124088 0.9124088 0.8978102 0.8905109 0.8759124 0.8613139
 [15] 0.8613139 0.8467153 0.8394161 0.8394161 0.8175182 0.8029197 0.7883212
 [22] 0.7737226 0.7664234 0.7664234 0.7518248 0.7299270 0.7299270 0.7225540
 [29] 0.7225540 0.7151810 0.7004350 0.6856890 0.6856890 0.6783160 0.6783160

My question is: are these the actual survival estimates for my 300,000 individual records wherein I need to use survest(0:300000)? I tried survest(0:1000) but the result already converged to some value and this does not answer my problem.

我的问题是:这些是我的300,000份个人记录的实际生存估计值,其中我需要使用survest(0:300000)?我尝试了survest(0:1000),但结果已经收敛到某个值,这不能解决我的问题。

1 个解决方案

#1


1  

As mentioned in my comment, I don't think it is possible to get KM-estimates for individual patients. The KM-estimator gives the observed probability of survival at a certain timepoint on a population level. The observed survival probability for an individual, however, is either 0 (death) or 1 (alive), nothing in between.

正如我在评论中所提到的,我认为不可能对个别患者进行KM估计。 KM估计器给出在人口水平上在某个时间点观察到的生存概率。然而,观察到的个体的存活概率是0(死亡)或1(存活),其间没有任何东西。

Instead of observed survival probabilities you will have to use some sort of model (e.g. Cox PH, accelerated failure time model, neural network etc.) to get predicted survival probabilities. These probabilities inform you about the risk of an individual with that particular variable combination to be alive at a particular timepoint.

除了观察到的生存概率,您还必须使用某种模型(例如Cox PH,加速失效时间模型,神经网络等)来获得预测的生存概率。这些概率告知您具有该特定变量组合的个体在特定时间点存活的风险。

UPDATE: with example code based on code the OP provided here

更新:使用基于此处提供的OP的代码的示例代码

library(pec) ; library(rms)

# Simulate data
set.seed(1)
examp.data <- SimSurv(3000)

# fit a Cox model with predictors X1+X2
coxmodel <- cph(Surv(time,status)~X1+X2, data=examp.data, surv=TRUE) 

# predicted survival probabilities can be extracted at selected time-points:
ttt <- quantile(examp.data$time)
ttt
#          0%          25%          50%          75%         100% 
#6.959458e-03 9.505409e+00 3.077284e+01 7.384565e+01 7.100556e+02 

# Get predicted survival probabilities at selected time-points:
preds <- predictSurvProb(coxmodel, newdata=examp.data, times=ttt)

# Store in original data
examp.data$predict.surv.prob.Q1 <- preds[,1] # pred. surv. prob. at  0.006959458
examp.data$predict.surv.prob.Q2 <- preds[,2] # pred. surv. prob. at  9.505409
examp.data$predict.surv.prob.Q3 <- preds[,3] # pred. surv. prob. at  30.77284
examp.data$predict.surv.prob.Q4 <- preds[,4] # pred. surv. prob. at  73.84565
examp.data$predict.surv.prob.Q5 <- preds[,5] # pred. surv. prob. at  710.0556

Now you have predictions of the survival probabilities at those 5 timepoints for each individual in your data. Of course, you do need to evaluate the predictive performance of your model in terms of discrimination (e.g. with the function cindex in pec-package) and calibration (with calibration plot, see rms-package).

现在,您可以预测数据中每个个体在这5个时间点的生存概率。当然,您需要根据区分(例如,使用pec-package中的函数cindex)和校准(使用校准图,参见rms-package)来评估模型的预测性能。

#1


1  

As mentioned in my comment, I don't think it is possible to get KM-estimates for individual patients. The KM-estimator gives the observed probability of survival at a certain timepoint on a population level. The observed survival probability for an individual, however, is either 0 (death) or 1 (alive), nothing in between.

正如我在评论中所提到的,我认为不可能对个别患者进行KM估计。 KM估计器给出在人口水平上在某个时间点观察到的生存概率。然而,观察到的个体的存活概率是0(死亡)或1(存活),其间没有任何东西。

Instead of observed survival probabilities you will have to use some sort of model (e.g. Cox PH, accelerated failure time model, neural network etc.) to get predicted survival probabilities. These probabilities inform you about the risk of an individual with that particular variable combination to be alive at a particular timepoint.

除了观察到的生存概率,您还必须使用某种模型(例如Cox PH,加速失效时间模型,神经网络等)来获得预测的生存概率。这些概率告知您具有该特定变量组合的个体在特定时间点存活的风险。

UPDATE: with example code based on code the OP provided here

更新:使用基于此处提供的OP的代码的示例代码

library(pec) ; library(rms)

# Simulate data
set.seed(1)
examp.data <- SimSurv(3000)

# fit a Cox model with predictors X1+X2
coxmodel <- cph(Surv(time,status)~X1+X2, data=examp.data, surv=TRUE) 

# predicted survival probabilities can be extracted at selected time-points:
ttt <- quantile(examp.data$time)
ttt
#          0%          25%          50%          75%         100% 
#6.959458e-03 9.505409e+00 3.077284e+01 7.384565e+01 7.100556e+02 

# Get predicted survival probabilities at selected time-points:
preds <- predictSurvProb(coxmodel, newdata=examp.data, times=ttt)

# Store in original data
examp.data$predict.surv.prob.Q1 <- preds[,1] # pred. surv. prob. at  0.006959458
examp.data$predict.surv.prob.Q2 <- preds[,2] # pred. surv. prob. at  9.505409
examp.data$predict.surv.prob.Q3 <- preds[,3] # pred. surv. prob. at  30.77284
examp.data$predict.surv.prob.Q4 <- preds[,4] # pred. surv. prob. at  73.84565
examp.data$predict.surv.prob.Q5 <- preds[,5] # pred. surv. prob. at  710.0556

Now you have predictions of the survival probabilities at those 5 timepoints for each individual in your data. Of course, you do need to evaluate the predictive performance of your model in terms of discrimination (e.g. with the function cindex in pec-package) and calibration (with calibration plot, see rms-package).

现在,您可以预测数据中每个个体在这5个时间点的生存概率。当然,您需要根据区分(例如,使用pec-package中的函数cindex)和校准(使用校准图,参见rms-package)来评估模型的预测性能。