R中的堆积条形图与比率线过度绘图

时间:2022-10-11 23:39:25

I have data with one observation per row:

我有每行一个观察数据:

rm(list = ls(all = TRUE))
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), var2 = sample(c("yes", "no"), 100, replace = TRUE), var3 = sample(c( "yes", "no"), 100, replace = TRUE), var4 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var5 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var6 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE))

I need to: make a stacked bar chart with side-by-side bar pairs, one bar for each kind (good vs bad), showing the count of how many of each kind have 0 "yes" vars, how many have 1 "yes" var, etc., up to "yes" for all 6 vars. Y-axis = count, X-axis = the seven categories (0 yes vars, 1 yes var, etc). Each bar should be a stacked bar color-coded showing the contribution of each var to the total height of the bar. NAs are treated as "no". Also, overplot line showing the ratio of count(good)/count(bad) for each of the seven X-axis categories

我需要:制作一个带有并排条形图对的堆积条形图,每种类型一条(好与坏),显示每种类型中有多少有0“是”变量,有多少有1“对于所有6个变量,是“var等等,最多为”是“。 Y轴=计数,X轴=七个类别(0是vars,1是var等)。每个条形应该是一个颜色编码的堆叠条,显示每个var对条形总高度的贡献。 NA被视为“否”。此外,过喷线显示七个X轴类别中每一个的计数(好)/计数(坏)的比率

1 个解决方案

#1


1  

Based on your description, here's what I understand what you're trying to achieve. It consists of three steps:

根据您的描述,这是我理解您要实现的目标。它包括三个步骤:

  1. Replace all NA's with "no".
  2. 用“否”替换所有NA。
  3. Add up all the "yes" in a row-wise manner.
  4. 以行方式添加所有“是”。
  5. Actually plotting the graph.
  6. 实际绘制图表。

So address each point.

所以解决每个问题。

Lets assume that your data is as follows:

让我们假设您的数据如下:

mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), 
                   var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), 
                   var2 = sample(c("yes", "no"), 100, replace = TRUE), 
                   var3 = sample(c( "yes", "no"), 100, replace = TRUE), 
                   var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))

1

1

To replace all NA's with "no" would simply be:

要用“否”替换所有NA,只需:

mydf[is.na(mydf)] <- "no"

here we are searching through the data.frame and replace all na with no's using the assignment operator.

这里我们正在搜索data.frame并使用赋值运算符替换所有na。

2

2

To add everything in a row-wise manner I used the apply function. Within the apply function you can use ?apply to determine the arguments, but in a nutshell, you (1st arg) simply specify the data.frame, (2nd arg) specify the direction, 1, for row-wise and 2 for column-wise, (3rd arg) specify the function you wish to apply to the direction.

为了以行方式添加所有内容,我使用了apply函数。在apply函数中你可以使用?apply来确定参数,但简而言之,你(第一个arg)只是指定data.frame,(第二个arg)指定方向,1表示行方式,2表示列 - 明智的,(第3个arg)指定你想要应用于方向的功能。

mydf$total.yes <- apply(mydf, 1, function(x) {
  return(length(x[x=="yes"]))
})

3

3

Lastly the plot. The easiest and aesthetic way to produce plot is to use ggplot. Install it by typeing install.packages("ggplot2"). For the bar plots I will refer to this [documentation](here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html), otherwise the code would look like the following.

最后的情节。制作情节最简单,最美观的方法是使用ggplot。通过键入install.packages(“ggplot2”)来安装它。对于条形图,我将参考此[文档](此处:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),否则代码将如下所示。

library(ggplot2)

ggplot(mydf, aes(total.yes, fill=kind)) +
  geom_bar(position="dodge")

which will produce the plot below:

这将产生以下情节:

R中的堆积条形图与比率线过度绘图

I hope this answers the questions you were after. The full code is as follows:

我希望这能回答你所追求的问题。完整代码如下:

mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), 
                   var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), 
                   var2 = sample(c("yes", "no"), 100, replace = TRUE), 
                   var3 = sample(c( "yes", "no"), 100, replace = TRUE), 
                   var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))

library(ggplot2)

# replace all NA values to no, this step seems redundant because you're only 
# counting yes's
mydf[is.na(mydf)] <- "no"

# for each row figure out how many "yes" there are...
mydf$total.yes <- apply(mydf, 1, function(x) {
  return(length(x[x=="yes"]))
})

# see example here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html
#using your data


ggplot(mydf, aes(total.yes, fill=kind)) +
  geom_bar(position="dodge")

geom_bar is actually stacked by default, (see [documentation](here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html), if it is stacked it will look something like the following:

geom_bar实际上是默认堆叠的(参见[documentation](这里:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),如果它是堆叠的,它将类似于以下内容:

ggplot(mydf, aes(total.yes, fill=kind)) +
  geom_bar()

R中的堆积条形图与比率线过度绘图

#1


1  

Based on your description, here's what I understand what you're trying to achieve. It consists of three steps:

根据您的描述,这是我理解您要实现的目标。它包括三个步骤:

  1. Replace all NA's with "no".
  2. 用“否”替换所有NA。
  3. Add up all the "yes" in a row-wise manner.
  4. 以行方式添加所有“是”。
  5. Actually plotting the graph.
  6. 实际绘制图表。

So address each point.

所以解决每个问题。

Lets assume that your data is as follows:

让我们假设您的数据如下:

mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), 
                   var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), 
                   var2 = sample(c("yes", "no"), 100, replace = TRUE), 
                   var3 = sample(c( "yes", "no"), 100, replace = TRUE), 
                   var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))

1

1

To replace all NA's with "no" would simply be:

要用“否”替换所有NA,只需:

mydf[is.na(mydf)] <- "no"

here we are searching through the data.frame and replace all na with no's using the assignment operator.

这里我们正在搜索data.frame并使用赋值运算符替换所有na。

2

2

To add everything in a row-wise manner I used the apply function. Within the apply function you can use ?apply to determine the arguments, but in a nutshell, you (1st arg) simply specify the data.frame, (2nd arg) specify the direction, 1, for row-wise and 2 for column-wise, (3rd arg) specify the function you wish to apply to the direction.

为了以行方式添加所有内容,我使用了apply函数。在apply函数中你可以使用?apply来确定参数,但简而言之,你(第一个arg)只是指定data.frame,(第二个arg)指定方向,1表示行方式,2表示列 - 明智的,(第3个arg)指定你想要应用于方向的功能。

mydf$total.yes <- apply(mydf, 1, function(x) {
  return(length(x[x=="yes"]))
})

3

3

Lastly the plot. The easiest and aesthetic way to produce plot is to use ggplot. Install it by typeing install.packages("ggplot2"). For the bar plots I will refer to this [documentation](here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html), otherwise the code would look like the following.

最后的情节。制作情节最简单,最美观的方法是使用ggplot。通过键入install.packages(“ggplot2”)来安装它。对于条形图,我将参考此[文档](此处:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),否则代码将如下所示。

library(ggplot2)

ggplot(mydf, aes(total.yes, fill=kind)) +
  geom_bar(position="dodge")

which will produce the plot below:

这将产生以下情节:

R中的堆积条形图与比率线过度绘图

I hope this answers the questions you were after. The full code is as follows:

我希望这能回答你所追求的问题。完整代码如下:

mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), 
                   var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), 
                   var2 = sample(c("yes", "no"), 100, replace = TRUE), 
                   var3 = sample(c( "yes", "no"), 100, replace = TRUE), 
                   var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                   var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))

library(ggplot2)

# replace all NA values to no, this step seems redundant because you're only 
# counting yes's
mydf[is.na(mydf)] <- "no"

# for each row figure out how many "yes" there are...
mydf$total.yes <- apply(mydf, 1, function(x) {
  return(length(x[x=="yes"]))
})

# see example here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html
#using your data


ggplot(mydf, aes(total.yes, fill=kind)) +
  geom_bar(position="dodge")

geom_bar is actually stacked by default, (see [documentation](here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html), if it is stacked it will look something like the following:

geom_bar实际上是默认堆叠的(参见[documentation](这里:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),如果它是堆叠的,它将类似于以下内容:

ggplot(mydf, aes(total.yes, fill=kind)) +
  geom_bar()

R中的堆积条形图与比率线过度绘图