为什么我的图表的Y轴顺序不正确?

时间:2022-10-19 09:35:07

I am trying to plot a graph for a data frame that looks like this:

我试图绘制一个如下所示的数据框图:

year week cases
2003    1     0
2003    2     0
2003    3    12
2003    4    23
2003    5    12
2003    6    16
2003    7    20
2003    8    13
2003    9     0
2003   10     0
2003   11    21
2003   12   133
2003   13     9
2003   14    22

Carrying data for 52 weeks running from 2003-2012.

从2003年至2012年运行52周的数据。

Here's what running dput(head(df,20) gives me:

这是运行dput(head(df,20)给我的东西:

structure(list(year = c(2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 
2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 
2003L, 2003L, 2003L, 2003L, 2003L), week = 1:20, cases = c(2, 
2, 26, 146, 26, 70, 115, 37, 2, 2, 124, 41, 245, 135, 146, 163, 
26, 26, 92, 92)), .Names = c("year", "week", "cases"), row.names 1925:1944, class = "data.frame")

I want my Y-axis to be simply the range of the variable 'cases', and the X-axis to run from week 1 through 52. I want to plot every year's data points in a different color.

我希望我的Y轴只是变量'case'的范围,而X轴从第1周到第52周运行。我想用不同的颜色绘制每年的数据点。

Here's my ggplot2 code:

这是我的ggplot2代码:

ggplot(df, aes(x=week, y=cases, col=year)) + geom_point()

This is the graph it's generating:

这是它产生的图表:

为什么我的图表的Y轴顺序不正确?

Why is this happening? I see no reason why my Y-axis shouldn't just be the range of 'cases' in ascending order.

为什么会这样?我认为没有理由为什么我的Y轴不应该只是按升序排列的'案例'。

1 个解决方案

#1


1  

To sum up what was said in the comments :

总结一下评论中的内容:

Your y-axis is indeed sorted but according to the character values (or rather the factor levels, as your variable was imported as factor) and not the numeric ones (so 1, 10, 11, ..., 2, 20, ...)

您的y轴确实已经排序但是根据字符值(或者更确切地说是因子级别,因为您的变量是作为因子导入的)而不是数字值(所以1,10,11,...,2,20,... ..)

There is 2 problems that need to be solved:
the first one is that you have to understand why the variable wasn't imported as numeric. You probably have a "strange" value (like 1,2 for example, ie a comma instead of a point as decimal separator)
The second one is you need numeric values to plot your data correctly. For that, you can transform your factor with df$cases <- as.numeric(as.character(df$cases)). Note that the strange value(s) will be converted to NAs, you may not want that.

有两个问题需要解决:第一个问题是你必须理解为什么变量没有作为数字导入。您可能有一个“奇怪”值(例如1,2,即逗号而不是点作为小数分隔符)第二个是您需要数值来正确绘制数据。为此,您可以使用df $ cases < - as.numeric(as.character(df $ cases))转换因子。请注意,奇怪的值将转换为NA,您可能不希望这样。

Just a final note, if you don't want your character variables to be imported as factors, you can use the parameter stringsAsFactors=FALSE in the import step.

最后要注意的是,如果您不希望将字符变量作为因子导入,则可以在导入步骤中使用参数stringsAsFactors = FALSE。

#1


1  

To sum up what was said in the comments :

总结一下评论中的内容:

Your y-axis is indeed sorted but according to the character values (or rather the factor levels, as your variable was imported as factor) and not the numeric ones (so 1, 10, 11, ..., 2, 20, ...)

您的y轴确实已经排序但是根据字符值(或者更确切地说是因子级别,因为您的变量是作为因子导入的)而不是数字值(所以1,10,11,...,2,20,... ..)

There is 2 problems that need to be solved:
the first one is that you have to understand why the variable wasn't imported as numeric. You probably have a "strange" value (like 1,2 for example, ie a comma instead of a point as decimal separator)
The second one is you need numeric values to plot your data correctly. For that, you can transform your factor with df$cases <- as.numeric(as.character(df$cases)). Note that the strange value(s) will be converted to NAs, you may not want that.

有两个问题需要解决:第一个问题是你必须理解为什么变量没有作为数字导入。您可能有一个“奇怪”值(例如1,2,即逗号而不是点作为小数分隔符)第二个是您需要数值来正确绘制数据。为此,您可以使用df $ cases < - as.numeric(as.character(df $ cases))转换因子。请注意,奇怪的值将转换为NA,您可能不希望这样。

Just a final note, if you don't want your character variables to be imported as factors, you can use the parameter stringsAsFactors=FALSE in the import step.

最后要注意的是,如果您不希望将字符变量作为因子导入,则可以在导入步骤中使用参数stringsAsFactors = FALSE。