数据分析:方差分析在R语言中的应用

时间:2024-04-29 16:02:14

介绍

方差分析的R语言实现包括以下部分:

  • 数据导入

  • 数据清洗

  • ANOVA计算

  • 结果解析

  • ANOVA评估

参考教程Analysis_of_Variance

knitr::opts_chunk$set(message = FALSE, warning = FALSE)

options(stringsAsFactors = F)
options(future.globals.maxSize = 1000 * 1024^2)

step1: 安装R包

install.packages(c("ggplot2", "ggpubr", "tidyverse"))

step2: 载R包

library(tidyverse) # 数据预处理R包
library(readxl) # 读取xlsx数据R包
library(ggpubr) # 画图R包

step3: 导入数据

  • 随机生成数据
data <- data.frame(D = c(rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4), rep("E", 4), rep("F", 4)),
                   RR = c(80,83,83,85,75,75,79,79,74,73,76,77,67,72,74,74,62,62,67,69,60,61,64,66))
  • 存储数据
write.table(data, file = "data.txt", sep = "\t", quote = F, row.names = F)
xlsx::write.xlsx(data, file = "data.xlsx", row.names = F)
  • txt数据格式
data <- read.table("data.txt", header = T)
  • xlsx数据格式
data <- read_xlsx("data.xlsx", sheet = 1)

step4: 数据清洗

  • 筛选数据:丢弃A组数据
data_drop <- data %>%
  dplyr::filter(D != "A")#%>%
  #dplyr::mutate(Test = "test")

head(data_drop)

  • 数据平均值和其他指标
data %>%
  group_by(D) %>%
  summarise(N=n(),
            Means=mean(RR),
            SS=sum((RR - Means)^2),
            SD=sd(RR),
            SEM=SD/N^.5)

  • 展示数据: boxplot
ggboxplot(data_drop, 
          x = "D", 
          y = "RR", 
          color = "D",
          ylab = "RR", xlab = "D")

step5: 单因素方差分析

  • one-way ANOVAs: 使用aov函数运行单因素方差分析 (公式是:Y是检验变量,X是分组变量);

  • 再使用summary函数获取单因素方差分析的结果。

# Y=RR; X=D
one.way <- aov(RR ~ D, data = data_drop)

summary(one.way)

结果解析:

  • Residuals是模型的残差,可以理解为截距;

  • Df列显示了自变量的*度(变量中的水平数减1)和残差的*度(观察总数减1和自变量中的水平数减1);

  • Sum Sq列显示平方和(即组均值与总体均值之间的总变化)。;

  • Mean Sq列是平方和的平均值,通过将平方和除以每个参数的*度来计算;

  • F value列是F检验的检验统计量。这是每个自变量的均方除以残差的均方。F值越大,自变量引起的变化越有可能是真实的,而不是偶然的;

  • Pr(>F)列是F统计量的p值。这表明,如果组均值之间没有差异的原假设成立,那么从检验中计算出的F值发生的概率大小。

  • 另一种方法:t-test仅仅适合2组比较,因此需要筛选

data_ttest <- data_drop %>%
  dplyr::filter(D %in% c("B", "C")) #%>%
  #dplyr::filter(RR != 77)

# data_test_filter <- filter(data_drop, D %in% c("B"))
# data_test_filter2 <- filter(data_test_filter, RR != 77)

t.test(RR ~ D, data = data_ttest)

step6: 后置检验

  • ANOVA结果仅仅揭示多个组间的差异结果,具体到哪两个组内部差异还需要做后置检验

  • 后置检验通常采用TukeyHD函数

TukeyHSD(one.way)

  • 该结果给出每个两组之间的结果;

  • diff: 两组的均值之差;

  • Lwr, upr: 95%置信区间的下限和上限(默认值) ;

  • P adj: 多次比较调整后的P值。

step7: 检查残差分布是否符合正态分布

  • ANOVA比较的是均值,需要每个分组的残差服从正态部分
plot(one.way, 2)

  • 采用Shapiro-Wilk对残差进行检验
shapiro.test(x = residuals(object = one.way))

结果显示:残差不显著也即是表明残差服从正态分布,可以采用ANOVA分析方法判断RR在D分组的分布水平。

step8: 方差齐性检验

library(car)

leveneTest(RR ~ D, data = data_drop, center = mean) 

bartlett.test(RR ~ D, data = data_drop) 

系统信息

devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       macOS Monterey 12.2.1
 system   x86_64, darwin20
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Asia/Shanghai
 date     2024-03-12
 rstudio  2023.09.0+463 Desert Sunflower (desktop)
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 abind         1.4-5   2016-07-21 [1] CRAN (R 4.3.0)
 backports     1.4.1   2021-12-13 [1] CRAN (R 4.3.0)
 broom         1.0.5   2023-06-09 [1] CRAN (R 4.3.0)
 bslib         0.6.1   2023-11-28 [1] CRAN (R 4.3.0)
 cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
 car           3.1-2   2023-03-30 [1] CRAN (R 4.3.0)
 carData       3.0-5   2022-01-06 [1] CRAN (R 4.3.0)
 cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.3.0)
 cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.0)
 colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
 digest        0.6.34  2024-01-11 [1] CRAN (R 4.3.0)
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
 evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.0)
 fansi         1.0.6   2023-12-08 [1] CRAN (R 4.3.0)
 farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
 fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
 generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.0)
 ggpubr      * 0.6.0   2023-02-10 [1] CRAN (R 4.3.0)
 ggsignif      0.6.4   2022-10-13 [1] CRAN (R 4.3.0)
 glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.0)
 gtable        0.3.4   2023-08-21 [1] CRAN (R 4.3.0)
 hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
 htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.0)
 htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.3.0)
 httpuv        1.6.14  2024-01-26 [1] CRAN (R 4.3.2)
 jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.0)
 jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.0)
 knitr         1.45    2023-10-30 [1] CRAN (R 4.3.0)
 labeling      0.4.3   2023-08-29 [1] CRAN (R 4.3.0)
 later         1.3.2   2023-12-06 [1] CRAN (R 4.3.0)
 lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.0)
 lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
 mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgbuild      1.4.3   2023-12-10 [1] CRAN (R 4.3.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 pkgload       1.3.4   2024-01-16 [1] CRAN (R 4.3.0)
 profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
 promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.0)
 purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp          1.0.12  2024-01-09 [1] CRAN (R 4.3.0)
 readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.3.0)
 readxl      * 1.4.3   2023-07-06 [1] CRAN (R 4.3.0)
 remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.0)
 rJava         1.0-6   2021-12-10 [1] CRAN (R 4.3.0)
 rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.0)
 rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.0)
 rstatix       0.7.2   2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
 sass          0.4.8   2023-12-06 [1] CRAN (R 4.3.0)
 scales        1.3.0   2023-11-28 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 shiny         1.8.0   2023-11-17 [1] CRAN (R 4.3.0)
 stringi       1.8.3   2023-12-11 [1] CRAN (R 4.3.0)
 stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.3.0)
 tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.1   2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
 timechange    0.3.0   2024-01-18 [1] CRAN (R 4.3.0)
 tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
 usethis       2.2.2   2023-07-06 [1] CRAN (R 4.3.0)
 utf8          1.2.4   2023-10-22 [1] CRAN (R 4.3.0)
 vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.0)
 withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.0)
 xfun          0.41    2023-11-01 [1] CRAN (R 4.3.0)
 xlsx          0.6.5   2020-11-10 [1] CRAN (R 4.3.0)
 xlsxjars      0.6.1   2014-08-22 [1] CRAN (R 4.3.0)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)
 yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

参考材料

  • Analysis_of_Variance

  • R语言统计分析 04 多组间差异的单因素方差分析(ANOVA)