具有重复和缺失点的空间数据

时间:2022-05-30 20:18:27

I am analysing data from an egg survey. Data is available from different points in the North Sea, some stations are recorded double at different dates. The sea should be covered by 0.5 x 0.5 degree squares. I have two questions for which I couldn't find any solution yet:

我正在分析鸡蛋调查的数据。数据可从北海的不同地点获得,有些站点在不同日期记录两次。海应覆盖0.5 x 0.5度的正方形。我有两个问题,我找不到任何解决方案:

  1. How do I replace the points with duplicated locations and different dates with a mean value? I know how to remove duplicates or how to replace them by max or min but couldn't find a way how to calculate a mean.

    如何使用平均值替换重复位置和不同日期的点?我知道如何删除重复或如何用max或min替换它们但是找不到如何计算平均值的方法。

  2. How do I calculate interpolated values for the missing points, based on neighbouring cells. Interpolated values should be calculated as long and only if at least two recorded points are neighbouring.

    如何根据相邻单元计算缺失点的插值。插值应该尽可能长,并且只有至少有两个记录点是相邻的。

I tried with setting a grid, but did not come very far as I couldn't find a way how to tell R when to interpolate and when not.

我尝试设置一个网格,但没有走得太远,因为我找不到如何告诉R何时进行插值以及何时不进行插值的方法。

Sample data:

egg_data <- structure(list(Latitude = c(54.25, 54.25, 54.25, 54.25, 54.25, 
54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 54.25, 54.25, 54.25, 53.25, 58.25, 57.75, 
57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 56.75, 
56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 
56.75, 56.75, 56.75, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 
56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 
56.25, 56.75, 56.75, 56.75), Longitude = c(6.25, 5.25, 5.25, 
4.25, 4.25, 3.25, 3.25, 2.25, 2.25, 1.25, 1.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 1.25, 1.25, 2.25, 2.25, 3.25, 3.25, 4.25, 
4.25, 5.25, 5.25, 5.25, 5.25, 4.25, 4.25, 3.25, 3.25, 2.25, 2.25, 
1.25, 1.25, 0.25, 0.25, 0.25, 0.25, 1.25, 1.25, 0.25, 0.25, 0.25, 
0.25, 3.25, 3.25, 3.25, 2.75, 2.25, 1.75, 1.25, 0.75, 0.25, 0.25, 
0.25, 0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3.25, 3.75, 4.25, 4.75, 
5.25, 5.75, 6.25, 5.75, 5.25, 4.75, 4.25, 3.75, 3.25, 2.25, 1.75, 
1.25, 0.75, 0.25, 0.25, 0.75, 1.25, 1.75, 1.75, 1.25, 0.75), 
    Eggs = c(9L, 6L, 4L, 20L, 57L, 14L, 35L, 18L, 4L, 1L, 3L, 
    100L, 1L, 201L, 0L, 51L, 52L, 23L, 19L, 4L, 5L, 23L, 11L, 
    18L, 7L, 7L, 14L, 6L, 3L, 4L, 20L, 13L, 19L, 5L, 16L, 23L, 
    28L, 11L, 9L, 12L, 19L, 62L, 6L, 3L, 15L, 110L, 57L, 0L, 
    14L, 3L, 3L, 8L, 94L, 62L, 7L, 19L, 511L, 59L, 283L, 308L, 
    20L, 44L, 61L, 24L, 10L, 10L, 15L, 6L, 8L, 12L, 32L, 2L, 
    5L, 10L, 21L, 4L, 1L, 19L, 3L, 4L, 4L, 17L, 51L, 108L, 1213L, 
    132L, 4L, 0L, 0L, 0L)), .Names = c("Latitude", "Longitude", 
"Eggs"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", 
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", 
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", 
"82", "83", "84", "85", "86", "87", "88", "89", "90"))

Thank you very much!!

非常感谢你!!

1 个解决方案

#1


1  

Add a factor for each location

为每个位置添加一个因子

egg_data <- within(egg_data, Location <- paste("(", Latitude, ", ", Longitude, ")", sep = "") )

egg_data < - within(egg_data,Location < - paste(“(”,Latitude,“,”,Longitude,“)”,sep =“”))

EDIT: There's no point in being fancy about this, since we want to reverse the process shortly.

编辑:对这一点没有任何意义,因为我们想要尽快扭转这一过程。

egg_data <- within(egg_data, 
  Location <- paste(Latitude, Longitude, sep = ",")
)

Then there are loads of ways of getting the mean.

然后有很多方法来获得平均值。

means_by_location <- with(egg_data, tapply(Eggs, Location, mean))

or

library(plyr)
means_by_location2 <- ddply(egg_data, .(Location), summarise, Mean.eggs = mean(Eggs))

or

means_by_location3 <- aggregate(Eggs ~ Location, egg_data, mean)

or

means_by_location4 <- with(egg_data, by(Eggs, Location, mean))

EDIT: For the next bit, you want to hav the result in a data frame, so use method 2 or 3.

编辑:对于下一位,您希望将结果保存在数据框中,因此请使用方法2或3。

Add the latitude and longitude back in to your new dataset. (Lots of ways of doing this.)

将纬度和经度添加回新数据集。 (很多方法都这样做。)

lat_long <- strsplit(means_by_location2$Location, ",")
means_by_location2$Latitude <- sapply(lat_long, function(x) x[1]) 
means_by_location2$Longitude <- sapply(lat_long, function(x) x[2])

This is your first question answered.

这是你回答的第一个问题。


For the second question, you need to think a bit more. Take a look a plot of eggs by location.

对于第二个问题,你需要多思考一下。看看地点上的鸡蛋情节。

library(ggplot2)
(p <- ggplot(means_by_location2, aes(Longitude, Latitude, colour = log10(Mean.eggs  +1))) +
  geom_point() +
  scale_colour_gradient(low = "#FFFFFF", high = "#0000FF", space = "Lab")
)

Are you interpolating north to south, or east to west, or with all neighbouring points? There are lots of different possibilities and they may have different answers. It's a nontrivial task to say which interpolation is best.

您是从北向南,从东向西,还是与所有相邻点进行插值?有很多不同的可能性,他们可能会有不同的答案。说哪种插值最好是一项非常重要的任务。

#1


1  

Add a factor for each location

为每个位置添加一个因子

egg_data <- within(egg_data, Location <- paste("(", Latitude, ", ", Longitude, ")", sep = "") )

egg_data < - within(egg_data,Location < - paste(“(”,Latitude,“,”,Longitude,“)”,sep =“”))

EDIT: There's no point in being fancy about this, since we want to reverse the process shortly.

编辑:对这一点没有任何意义,因为我们想要尽快扭转这一过程。

egg_data <- within(egg_data, 
  Location <- paste(Latitude, Longitude, sep = ",")
)

Then there are loads of ways of getting the mean.

然后有很多方法来获得平均值。

means_by_location <- with(egg_data, tapply(Eggs, Location, mean))

or

library(plyr)
means_by_location2 <- ddply(egg_data, .(Location), summarise, Mean.eggs = mean(Eggs))

or

means_by_location3 <- aggregate(Eggs ~ Location, egg_data, mean)

or

means_by_location4 <- with(egg_data, by(Eggs, Location, mean))

EDIT: For the next bit, you want to hav the result in a data frame, so use method 2 or 3.

编辑:对于下一位,您希望将结果保存在数据框中,因此请使用方法2或3。

Add the latitude and longitude back in to your new dataset. (Lots of ways of doing this.)

将纬度和经度添加回新数据集。 (很多方法都这样做。)

lat_long <- strsplit(means_by_location2$Location, ",")
means_by_location2$Latitude <- sapply(lat_long, function(x) x[1]) 
means_by_location2$Longitude <- sapply(lat_long, function(x) x[2])

This is your first question answered.

这是你回答的第一个问题。


For the second question, you need to think a bit more. Take a look a plot of eggs by location.

对于第二个问题,你需要多思考一下。看看地点上的鸡蛋情节。

library(ggplot2)
(p <- ggplot(means_by_location2, aes(Longitude, Latitude, colour = log10(Mean.eggs  +1))) +
  geom_point() +
  scale_colour_gradient(low = "#FFFFFF", high = "#0000FF", space = "Lab")
)

Are you interpolating north to south, or east to west, or with all neighbouring points? There are lots of different possibilities and they may have different answers. It's a nontrivial task to say which interpolation is best.

您是从北向南,从东向西,还是与所有相邻点进行插值?有很多不同的可能性,他们可能会有不同的答案。说哪种插值最好是一项非常重要的任务。