在R中,group_by中的条件最大值/最小值。

时间:2021-05-08 20:37:02

I have been searching for an answer to this for a while without much luck so fingers crossed someone can help me!

我一直在寻找这个问题的答案,但运气不太好,希望有人能帮助我!

I am dealing with cyclical data and I am trying to find the associated value of the two peaks and two troughs - this doesn't necessary equate to the max/min and second max/min values but rather the max/min and then the second max/min values conditional on the value being larger/smaller than the preceding and subsequent values.

我处理周期性数据,我试图找到相关的两个高峰和两个低谷——这并不必要等同于最大值/最小值和第二最大/最小值而是最大值/最小值然后第二最大/最小值条件的值比前和随后的大/小值。

This is an example of one cycle

这是一个循环的例子

x <- c(3.049, 3.492, 3.503, 3.429, 3.013, 2.881, 2.29, 1.785, 1.211, 0.890, 0.859, 0.903, 1.165, 1.634, 2.073, 2.477, 3.162, 3.207, 3.177, 2.742, 2.24, 1.827, 1.358, 1.111, 1.063, 1.098, 1.287, 1.596, 2.169, 2.292)

I have 1000's of cycles so I am using group_by in dplyr to group the cycles and then hoped to apply the conditional max/min argument within groups.

我有1000个周期,所以我在dplyr中使用group_by对周期进行分组,然后希望在组中应用条件max/min参数。

I would appreciate any advice with this,

如果您能给我一些建议,

Thanks in advance

谢谢提前

Edit

I have since used the below function with just a slight edit on the last line

从那以后,我使用了下面的函数,对最后一行进行了轻微的编辑

  return(data.frame(Data.value=x, Time=y, Date=z,HHT=peak, LLT=trough)) 

where x is my original x above, y is a time var and z is a date var. This allowed me to do some extra calculations on the results (I needed the time at which the value was min/max as well as the value itself).

在上面的x是原始的x, y是时间var, z是日期var,这允许我对结果做一些额外的计算(我需要的时间是最小/max以及值本身)。

So now I have a dataframe with everything I need but it is only for one date - I still can't get this run through the whole dataset using the group_by function. I have tried sub-setting by date using

现在我有了一个dataframe,它包含了我需要的所有东西,但它只有一个日期——我仍然不能使用group_by函数在整个数据集中运行。我试过按日期设置

subsets<-split(data, data$datevar, drop=TRUE)

<分割子集(数据、数据datevar美元下跌= true)< p>

But still need a way to somehow run the findminmax function (and my few extra lines of code) for each subset. Any ideas?

但是仍然需要一种方法来为每个子集运行findminmax函数(以及我额外的几行代码)。什么好主意吗?

1 个解决方案

#1


0  

Consider the following custom function that you can pass in a dplyr group_by() procedure. Essentially, function iterates through list of cyclical values and compares neighbor before and after it. Peaks would have neighbors both lower than itself and troughs with neighbors larger than iteself.

考虑以下自定义函数,您可以在dplyr group_by()过程中传递它。本质上,函数遍历循环值列表,并对其前后的邻居进行比较。山峰有比自己矮的邻居,有比自己大的邻居。

findminmax <- function(x){
  peak <- list(NA, NA)                              # INITIALIZE TEMP LISTS AND ITERATORS
  p <- 1
  trough <- list(NA, NA)
  t <- 1

  for (i in 1:length(x)){
    if (i != 1 & i != length(x)){                   # LEAVES OUT FIRST AND LAST VALUES
      if ((x[i] > x[i-1]) & (x[i] > x[i+1])) {      # COMPARES IF GREATER THAN NEIGHBORS
        peak[p] <- x[i]
        p <- p + 1
      }
      if ((x[i] < x[i-1]) & (x[i] < x[i+1])){       # COMPARES IF LESS THAN NEIGHBORS
        trough[t] <- x[i]
        t <- t + 1
      }
    }
  }
  return(list(peak1=peak[[1]], peak2=peak[[2]], 
              trough1=trough[[1]], trough2=trough[[2]]))
}

result <- findminmax(x)
#$peak1
#[1] 3.503    
#$peak2
#[1] 3.207    
#$trough1
#[1] 0.859    
#$trough2
#[1] 1.063

For dplyr's group_by:

dplyr group_by的:

finaldf <- originaldf %>% 
             group_by(z) %>% 
                summarise(Time = mean(y),
                          HHT1 = findminmax(x)$peak1,
                          HHT2 = findminmax(x)$peak2,
                          LLT1 = findminmax(x)$trough1,
                          LLT2 = findminmax(x)$trough2)

#1


0  

Consider the following custom function that you can pass in a dplyr group_by() procedure. Essentially, function iterates through list of cyclical values and compares neighbor before and after it. Peaks would have neighbors both lower than itself and troughs with neighbors larger than iteself.

考虑以下自定义函数,您可以在dplyr group_by()过程中传递它。本质上,函数遍历循环值列表,并对其前后的邻居进行比较。山峰有比自己矮的邻居,有比自己大的邻居。

findminmax <- function(x){
  peak <- list(NA, NA)                              # INITIALIZE TEMP LISTS AND ITERATORS
  p <- 1
  trough <- list(NA, NA)
  t <- 1

  for (i in 1:length(x)){
    if (i != 1 & i != length(x)){                   # LEAVES OUT FIRST AND LAST VALUES
      if ((x[i] > x[i-1]) & (x[i] > x[i+1])) {      # COMPARES IF GREATER THAN NEIGHBORS
        peak[p] <- x[i]
        p <- p + 1
      }
      if ((x[i] < x[i-1]) & (x[i] < x[i+1])){       # COMPARES IF LESS THAN NEIGHBORS
        trough[t] <- x[i]
        t <- t + 1
      }
    }
  }
  return(list(peak1=peak[[1]], peak2=peak[[2]], 
              trough1=trough[[1]], trough2=trough[[2]]))
}

result <- findminmax(x)
#$peak1
#[1] 3.503    
#$peak2
#[1] 3.207    
#$trough1
#[1] 0.859    
#$trough2
#[1] 1.063

For dplyr's group_by:

dplyr group_by的:

finaldf <- originaldf %>% 
             group_by(z) %>% 
                summarise(Time = mean(y),
                          HHT1 = findminmax(x)$peak1,
                          HHT2 = findminmax(x)$peak2,
                          LLT1 = findminmax(x)$trough1,
                          LLT2 = findminmax(x)$trough2)