在R中有较大的重叠和3000点的散点图

时间:2022-12-07 00:22:00

I am making a scatter plot in R with ggplot2. I am comparing the fraction of votes Hillary and Bernie received in the primary and education level. There is a lot over overlap and way to many points. I tried to use transparency so I could see the overlap but it still looks bad.

我用R和ggplot2做了一个散点图。我正在比较希拉里和伯尼在小学和教育方面获得的选票比例。有很多重叠的地方,还有很多点。我尝试使用透明度,这样我就能看到重叠,但看起来还是很糟糕。

在R中有较大的重叠和3000点的散点图

Code:

代码:

demanalyze <- function(infocode, n = 1){
    infoname <- filter(infolookup, column_name == infocode)$description
    infocolumn <- as.vector(as.matrix(mydata[infocode]))
    ggplot(mydata) +
    aes(x = infocolumn) +
    ggtitle(infoname) +
    xlab(infoname) +
    ylab("Fraction of votes each canidate recieved") +
    xlab(infoname) +
    geom_point(aes(y = sanders_vote_fraction, colour = "Bernie Sanders")) +#, color = alpha("blue",0.02), size=I(1)) +
    stat_smooth(aes(y = sanders_vote_fraction), method = "lm", formula = y ~ poly(x, n), size = 1, color = "darkblue", se = F) +
    geom_point(aes(y = clinton_vote_fraction, colour = "Hillary Clinton")) +#, color = alpha("red",0.02), size=I(1)) +
    stat_smooth(aes(y = clinton_vote_fraction), method = "lm", formula = y ~ poly(x, n), size = 1, color = "darkred", se = F) +
    scale_colour_manual("", 
        values = c("Bernie Sanders" = alpha("blue",0.02), "Hillary Clinton" = alpha("red",0.02))
    ) +
    guides(colour = guide_legend(override.aes = list(alpha = 1)))
}

What could I change to make the overlap areas look less messy?

为了使重叠区域看起来不那么混乱,我可以做些什么改变呢?

1 个解决方案

#1


3  

The standard way to plot a large number of points over 2 dimensions is to use 2D density plots:

在二维空间上绘制大量点的标准方法是使用二维密度图:

With reproducible example:

用可再生的例子:

x1 <- rnorm(1000, mean=10)
x2 <- rnorm(1000, mean=10)
y1 <- rnorm(1000, mean= 5)
y2 <- rnorm(1000, mean = 7)


mydat <- data.frame(xaxis=c(x1, x2), yaxis=c(y1, y2), lab=rep(c("H","B"),each=1000))
head(mydat)

library(ggplot2)
##Dots and density plots (kinda messy, but can play with alpha)
p1 <-ggplot(mydat) + geom_point(aes(x=xaxis, y = yaxis, color=lab),alpha=0.4) +
stat_density2d(aes(x=xaxis, y = yaxis, color=lab))
p1

在R中有较大的重叠和3000点的散点图

## just density
p2 <-ggplot(mydat) + stat_density2d(aes(x=xaxis, y = yaxis, color=lab))
p2

在R中有较大的重叠和3000点的散点图

There are many parameters to play with, so look here for the full info on the plot type in ggplot2.

有许多参数可以使用,所以请在这里查看关于ggplot2中的情节类型的完整信息。

#1


3  

The standard way to plot a large number of points over 2 dimensions is to use 2D density plots:

在二维空间上绘制大量点的标准方法是使用二维密度图:

With reproducible example:

用可再生的例子:

x1 <- rnorm(1000, mean=10)
x2 <- rnorm(1000, mean=10)
y1 <- rnorm(1000, mean= 5)
y2 <- rnorm(1000, mean = 7)


mydat <- data.frame(xaxis=c(x1, x2), yaxis=c(y1, y2), lab=rep(c("H","B"),each=1000))
head(mydat)

library(ggplot2)
##Dots and density plots (kinda messy, but can play with alpha)
p1 <-ggplot(mydat) + geom_point(aes(x=xaxis, y = yaxis, color=lab),alpha=0.4) +
stat_density2d(aes(x=xaxis, y = yaxis, color=lab))
p1

在R中有较大的重叠和3000点的散点图

## just density
p2 <-ggplot(mydat) + stat_density2d(aes(x=xaxis, y = yaxis, color=lab))
p2

在R中有较大的重叠和3000点的散点图

There are many parameters to play with, so look here for the full info on the plot type in ggplot2.

有许多参数可以使用,所以请在这里查看关于ggplot2中的情节类型的完整信息。