如何使用if语句将整行分隔成新的数据帧

时间:2022-12-27 20:27:45

I have a data frame df that looks something like this:

我有一个数据框df看起来像这样:

Date         Company         MarketCap
2000-01-31   Company one     1000
2000-02-28   Company one     2000
2000-03-31   Company one     3000
2000-01-31   Company two     2500
2000-02-28   Company two     3000
2000-03-31   Company two     3500
2000-01-31   Company three   1500
2000-02-28   Company three   1800
2000-03-31   Company three   1100

I need an if-statement that does the following:

我需要一个执行以下操作的if语句:

If(df$MarketCap >= median(df$MarketCap){
  BigCap <- df[all the rows that have a market cap >= median(df$MarketCap)
}

Put in words; For each row of df$MarketCap, I want to check if the market caps are greater than or equal to the median market cap of df$MarketCap. All rows containing market caps greater than or equal to the median market cap of df$MarketCap should make up a new data frame, BigCap.

用语言说出来;对于df $ MarketCap的每一行,我想检查市值是否大于或等于df $ MarketCap的中位数市值。包含市值大于或等于df $ MarketCap中位数市值的所有行应构成一个新的数据框BigCap。

The new data frame BigCap should thus be like this:

因此,新数据框BigCap应如下所示:

BigCap:

Date         Company         MarketCap
2000-02-28   Company one     2000
2000-03-31   Company one     3000
2000-01-31   Company two     2500
2000-02-28   Company two     3000
2000-03-31   Company two     3500

I feel like this should be easy to acheive using an if-statement, but I haven't had any success this far (not by looking at similar questions at SO either). I appreciate all the help I can get.

我觉得这应该很容易使用if语句来实现,但到目前为止我还没有取得任何成功(也不是在SO处查看类似的问题)。我很感激能得到的所有帮助。

Note, my real df is a lot larger than the example provided here, where I have 360 dates and over 2000 companies.

请注意,我的真实df比这里提供的示例大很多,我有360个日期和超过2000家公司。

2 个解决方案

#1


2  

I like CPak's answer but if you need the separate data.frames, this works:

我喜欢CPak的答案,但是如果你需要单独的data.frames,这可行:

df <- data.frame(date = rep(Sys.Date() - c(60,30,0), 3), comp = rep(1:3, each = 3),
             cap = c(1000, 2000, 3000, 2500, 3000, 3500, 1500, 1800, 1100))

for (i in unique(as.character(df$date))) {
  med <- median(df$cap[df$date == i])
  assign(paste0("smallCap", format(as.Date(i), "%b")),
         df[df$date == i & df$cap < med, ])
  assign(paste0("bigCap", format(as.Date(i), "%b")),
         df[df$date == i & df$cap >= med, ])
}

EDIT: in comments, OP asked for a data frame for a specific month.

编辑:在评论中,OP询问了特定月份的数据框架。

For a given month in a specific year, say Oct 2017:

对于特定年份的特定月份,例如2017年10月:

# first calculate median
med <- median(df$cap[format(df$date, "%Y-%m") == "2017-10"])
# subset df
BigCapOct <- df[format(df$date, "%Y-%m") == "2017-10" & df$cap >= med, ]

For the month of October across all years:

所有年份的十月份:

med <- median(df$cap[format(df$date, "%m") == "10"])
BigCapOct <- df[format(df$date, "%m") == "10" & df$cap >= med, ]

#2


2  

I created SmallCap and LargeCap, which is a list of data.frames that contain either observations that are < median(MarketCap) or >= median(MarketCap). Each entry of the list is a separate Date.

我创建了SmallCap和LargeCap,它是一个data.frames列表,其中包含 = median(MarketCap)的观察结果。列表的每个条目都是单独的日期。 (marketcap)或>

library(dplyr)
SmallCap <- df %>%
             group_by(Date) %>%
           filter(MarketCap < median(MarketCap)) %>%
             split(.$Date)

# $`1`
# # A tibble: 1 x 3
# # Groups:   Date [1]
        # Date     Company MarketCap
      # <fctr>      <fctr>     <int>
# 1 2000-01-31 Company_one      1000

# $`2`
# # A tibble: 1 x 3
# # Groups:   Date [1]
        # Date       Company MarketCap
      # <fctr>        <fctr>     <int>
# 1 2000-02-28 Company_three      1800

# $`3`
# # A tibble: 1 x 3
# # Groups:   Date [1]
        # Date       Company MarketCap
      # <fctr>        <fctr>     <int>
# 1 2000-03-31 Company_three      1100

LargeCap <- df %>%
         group_by(Date) %>%
           filter(MarketCap >= median(MarketCap)) %>%
             split(.$Date)

# $`2000-01-31`
# # A tibble: 2 x 3
# # Groups:   Date [1]
        # Date       Company MarketCap
      # <fctr>        <fctr>     <int>
# 1 2000-01-31   Company_two      2500
# 2 2000-01-31 Company_three      1500

# $`2000-02-28`
# # A tibble: 2 x 3
# # Groups:   Date [1]
        # Date     Company MarketCap
      # <fctr>      <fctr>     <int>
# 1 2000-02-28 Company_one      2000
# 2 2000-02-28 Company_two      3000

# $`2000-03-31`
# # A tibble: 2 x 3
# # Groups:   Date [1]
        # Date     Company MarketCap
      # <fctr>      <fctr>     <int>
# 1 2000-03-31 Company_one      3000
# 2 2000-03-31 Company_two      3500

#1


2  

I like CPak's answer but if you need the separate data.frames, this works:

我喜欢CPak的答案,但是如果你需要单独的data.frames,这可行:

df <- data.frame(date = rep(Sys.Date() - c(60,30,0), 3), comp = rep(1:3, each = 3),
             cap = c(1000, 2000, 3000, 2500, 3000, 3500, 1500, 1800, 1100))

for (i in unique(as.character(df$date))) {
  med <- median(df$cap[df$date == i])
  assign(paste0("smallCap", format(as.Date(i), "%b")),
         df[df$date == i & df$cap < med, ])
  assign(paste0("bigCap", format(as.Date(i), "%b")),
         df[df$date == i & df$cap >= med, ])
}

EDIT: in comments, OP asked for a data frame for a specific month.

编辑:在评论中,OP询问了特定月份的数据框架。

For a given month in a specific year, say Oct 2017:

对于特定年份的特定月份,例如2017年10月:

# first calculate median
med <- median(df$cap[format(df$date, "%Y-%m") == "2017-10"])
# subset df
BigCapOct <- df[format(df$date, "%Y-%m") == "2017-10" & df$cap >= med, ]

For the month of October across all years:

所有年份的十月份:

med <- median(df$cap[format(df$date, "%m") == "10"])
BigCapOct <- df[format(df$date, "%m") == "10" & df$cap >= med, ]

#2


2  

I created SmallCap and LargeCap, which is a list of data.frames that contain either observations that are < median(MarketCap) or >= median(MarketCap). Each entry of the list is a separate Date.

我创建了SmallCap和LargeCap,它是一个data.frames列表,其中包含 = median(MarketCap)的观察结果。列表的每个条目都是单独的日期。 (marketcap)或>

library(dplyr)
SmallCap <- df %>%
             group_by(Date) %>%
           filter(MarketCap < median(MarketCap)) %>%
             split(.$Date)

# $`1`
# # A tibble: 1 x 3
# # Groups:   Date [1]
        # Date     Company MarketCap
      # <fctr>      <fctr>     <int>
# 1 2000-01-31 Company_one      1000

# $`2`
# # A tibble: 1 x 3
# # Groups:   Date [1]
        # Date       Company MarketCap
      # <fctr>        <fctr>     <int>
# 1 2000-02-28 Company_three      1800

# $`3`
# # A tibble: 1 x 3
# # Groups:   Date [1]
        # Date       Company MarketCap
      # <fctr>        <fctr>     <int>
# 1 2000-03-31 Company_three      1100

LargeCap <- df %>%
         group_by(Date) %>%
           filter(MarketCap >= median(MarketCap)) %>%
             split(.$Date)

# $`2000-01-31`
# # A tibble: 2 x 3
# # Groups:   Date [1]
        # Date       Company MarketCap
      # <fctr>        <fctr>     <int>
# 1 2000-01-31   Company_two      2500
# 2 2000-01-31 Company_three      1500

# $`2000-02-28`
# # A tibble: 2 x 3
# # Groups:   Date [1]
        # Date     Company MarketCap
      # <fctr>      <fctr>     <int>
# 1 2000-02-28 Company_one      2000
# 2 2000-02-28 Company_two      3000

# $`2000-03-31`
# # A tibble: 2 x 3
# # Groups:   Date [1]
        # Date     Company MarketCap
      # <fctr>      <fctr>     <int>
# 1 2000-03-31 Company_one      3000
# 2 2000-03-31 Company_two      3500