如何从R中的y变量估计x变量？

Here is my data:

这是我的数据:

# A tibble: 8 x 3
    CFU strain diltn
  <dbl> <chr>  <dbl>
1 159   aM12    8748
2 124.  aM12    2916
3  76.5 aM12     972
4  22   aM12     324
5  16.5 aM12     108
6  17   aM12      36
7  22.5 aM12      12
8  17.5 aM12       4

This may seem like a simple question but I have mainly used R to get basic summaries of data, and graph them (using the dplyr and ggplot).

这似乎是一个简单的问题,但我主要使用R来获取数据的基本摘要,并对它们进行图形化(使用dplyr和ggplot)。

I can plot the graph:

我可以绘制图表:

ggplot(data=data, aes(x=diltn, y=CFU))+
  geom_point()+
  geom_line()+
  scale_x_log10()

I would like to estimate at what "diltn" (x variable), I would get 77 "CFU" (y variable).

我想估计什么是“diltn”(x变量),我会得到77“CFU”(y变量)。

I managed this in excel and graphed it as follows to illustrate what I would like to achieve:

我在excel中对此进行了管理,并将其绘制如下,以说明我想要实现的目标:

2 个解决方案

#1

This is actually a much trickier question (in general) than it looks. It's not that it can't be done (there are many options), but it depends greatly on how your data behaves. For example, suppose the y-value of interest is 20 instead of 77. Any value of diltn between 4 and 324 is now a "reasonable" answer.

这实际上是一个比它看起来更棘手的问题(一般而言)。这不是无法完成的(有很多选项),但它在很大程度上取决于数据的行为方式。例如,假设感兴趣的y值是20而不是77.在4和324之间的任何稀释值现在都是“合理的”答案。

To get around this issue, we use statistical models. If I'm guessing correctly and you're working with a dose-response model (or something similar - e.g. I've used them with standard curves in assays), you might check out drm() in the drc package, which can fit these curves appropriately.

为了解决这个问题,我们使用统计模型。如果我正确地猜测并且您正在使用剂量反应模型(或类似的东西 - 例如我在测定中使用它们的标准曲线),您可以在drc包中查看drm(),这可以适合这些曲线适当。

Something like:

mod <- drm(CFU ~ diltn, data = data, fct = LL.4())
plot(mod)

The ED function is then used to extract the relevant data. I work with standard curves, and I find the following settings to be useful, but you might need different ones depending on how your data works.

然后使用ED功能提取相关数据。我使用标准曲线,我发现以下设置很有用,但根据数据的工作原理,您可能需要不同的设置。

ED(mod, 77, bound = FALSE, type = 'absolute')
# Estimated effective doses
# 
#        Estimate Std. Error
# e:1:77  1103.69     176.31

It's been awhile since I read the vignettes on it though, so you will probably need to do some reading to make sure you get the correct result.

自从我读了它上面的小插曲以来已经有一段时间了,所以你可能需要做一些阅读以确保你得到正确的结果。

#2

Based on the scatter plot, we can probably fit a non-linear regression line to the dataset. Assuming your dataset is called dat. We can use the nls function to fit the regression model. Notice that it takes some efforts and thinking to find the possible equation and starting values. In this case, the equation is CFU ~ a * diltn/(b + diltn) and the starting value for a and b are 100 and 1000, respectively.

基于散点图,我们可以将非线性回归线拟合到数据集。假设您的数据集称为dat。我们可以使用nls函数来拟合回归模型。请注意,需要一些努力并思考找到可能的等式和起始值。在这种情况下,方程式为CFU~a * diltn /(b + diltn),a和b的起始值分别为100和1000。

library(tidyverse)

fit <- nls(formula = CFU ~ a * diltn/(b + diltn), 
           start = list(a = 100, b = 1000), data = dat)

summary(fit)

# Formula: CFU ~ a * diltn/(b + diltn)
# 
# Parameters:
#   Estimate Std. Error t value Pr(>|t|)    
# a   187.32      21.25   8.814 0.000118 ***
# b  1514.27     517.50   2.926 0.026420 *  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 13.17 on 6 degrees of freedom
# 
# Number of iterations to convergence: 4 
# Achieved convergence tolerance: 3.555e-06

To visually inspect the model fit, we can first create a second data frame with diltn from 1 to 9000. We can then use the predict function to predict the CFU based on diltn and the model fit.

为了目视检查模型拟合,我们可以首先使用从1到9000的diltn创建第二个数据框。然后我们可以使用预测函数来预测基于稀释和模型拟合的CFU。

dat2 <- data_frame(diltn = 1:9000) %>% 
  mutate(Pred = predict(fit, .))

ggplot(data = dat, aes(x = diltn, y = CFU))+
  geom_point() +
  geom_line(data = dat2, aes(x = diltn, y = Pred), color = "red")

The model looks good to me.

这个模型对我来说很好看。

Finally, we can filter the Pred values to find the possible values for diltn. In this case, I think 1057 could be a possible answer.

最后,我们可以过滤Pred值以找到diltn的可能值。在这种情况下,我认为1057可能是一个可能的答案。

dat2 %>% filter(Pred > 76.9, Pred < 77.1)

# # A tibble: 5 x 2
#   diltn  Pred
#   <int> <dbl>
# 1  1055  76.9
# 2  1056  77.0
# 3  1057  77.0
# 4  1058  77.0
# 5  1059  77.1

Or since we have fitted a non-linear regression model and we know the fitted parameter a and b, we can set CFU = 77 and calculate the diltn. My calculation shows diltn is 1056.914.

或者由于我们已经拟合了非线性回归模型并且我们知道拟合参数a和b,我们可以设置CFU = 77并计算稀释。我的计算显示diltn是1056.914。

#1