如何将正累积分布函数拟合到数据中

时间:2022-02-28 00:32:22

I have generated some data which is effectively a cumulative distribution, the code below gives an example of X and Y from my data:

我已经生成了一些有效的累积分布的数据,下面的代码从我的数据中给出了X和Y的例子:

X<- c(0.09787761, 0.10745590, 0.11815422, 0.15503521, 0.16887488, 0.18361325, 0.22166727,
0.23526786, 0.24198808, 0.25432602, 0.26387961, 0.27364063, 0.34864672, 0.37734113,
0.39230736, 0.40699061, 0.41063824, 0.42497043, 0.44176913, 0.46076456, 0.47229330,
0.53134509, 0.56903577, 0.58308938, 0.58417653, 0.60061901, 0.60483849, 0.61847521,
0.62735245, 0.64337353, 0.65783302, 0.67232004, 0.68884473, 0.78846000, 0.82793293,
0.82963446, 0.84392010, 0.87090024, 0.88384044, 0.89543314, 0.93899033, 0.94781219,
1.12390279, 1.18756693, 1.25057774)

Y<- c(0.0090, 0.0210, 0.0300, 0.0420, 0.0580, 0.0700, 0.0925, 0.1015, 0.1315, 0.1435,
0.1660, 0.1750, 0.2050, 0.2450, 0.2630, 0.2930, 0.3110, 0.3350, 0.3590, 0.3770, 0.3950,
0.4175, 0.4475, 0.4715, 0.4955, 0.5180, 0.5405, 0.5725, 0.6045, 0.6345, 0.6585, 0.6825,
0.7050, 0.7230, 0.7470, 0.7650, 0.7950, 0.8130, 0.8370, 0.8770, 0.8950, 0.9250, 0.9475,
0.9775, 1.0000)

plot(X,Y)

I would like to obtain the median, mean and some quantile information (say for example 5%, 95%) from this data. The way I was thinking of doing this was to fit a defined distribution to it and then integrate to get my quantiles, mean and median values.

我想从这些数据中获得中位数、平均值和一些分位数信息(例如5%、95%)。我想这样做的方法是将一个定义的分布与它相匹配然后积分得到我的分位数,均值和中值。

The question is how to fit the most appropriate cumulative distribution function to this data (I expect this may well be the Normal Cumulative Distribution Function).

问题是如何将最合适的累积分布函数拟合到该数据中(我预计这很可能是正累积分布函数)。

I have seen lots of ways to fit a PDF but I can't find anything on fitting a CDF.

我已经看到了许多安装PDF的方法,但是我找不到任何安装CDF的方法。

(I realise this may seem a basic question to many of you but it has me struggling!!)

(我意识到这可能是你们很多人的一个基本问题,但它让我挣扎!)

Thanks in advance

谢谢提前

1 个解决方案

#1


4  

Perhaps you could use nlm to find parameters that minimize the squared differences from your observed Y values and the expected for a normal distribution. Here an example using your data

也许您可以使用nlm找到参数,以最小化与观测到的Y值的平方差异,以及正态分布的期望值之间的差异。这里有一个使用您的数据的示例。

fn <- function(x) {
   mu <- x[1];
   sigma <- exp(x[2])
   sum((Y-pnorm(X,mu,sigma))^2)
}
est <- nlm(fn, c(1,1))$estimate

plot(X,Y)
curve(pnorm(x, est[1], exp(est[2])), add=T)

Unfortunately I don't know an easy with with this method to constrain sigma>0 without doing the exp transformation on the variable. But the fit seems reasonable

不幸的是,我不知道用这种方法来约束>不做变量的exp变换有什么简单之处。但这种搭配似乎是合理的

如何将正累积分布函数拟合到数据中

#1


4  

Perhaps you could use nlm to find parameters that minimize the squared differences from your observed Y values and the expected for a normal distribution. Here an example using your data

也许您可以使用nlm找到参数,以最小化与观测到的Y值的平方差异,以及正态分布的期望值之间的差异。这里有一个使用您的数据的示例。

fn <- function(x) {
   mu <- x[1];
   sigma <- exp(x[2])
   sum((Y-pnorm(X,mu,sigma))^2)
}
est <- nlm(fn, c(1,1))$estimate

plot(X,Y)
curve(pnorm(x, est[1], exp(est[2])), add=T)

Unfortunately I don't know an easy with with this method to constrain sigma>0 without doing the exp transformation on the variable. But the fit seems reasonable

不幸的是,我不知道用这种方法来约束>不做变量的exp变换有什么简单之处。但这种搭配似乎是合理的

如何将正累积分布函数拟合到数据中