How to retrieve a specific value from a list of data frame based on a condition provided?

时间:2021-06-08 20:19:26

I have a list of data frames (sample below) where the data is about the list of hospitals across each state.

我有一个数据框列表(下面的示例),其中数据是关于每个州的医院列表。

  • outcome_split is a list which has a list of data frames for each state.
  • outcome_split是一个列表,其中包含每个州的数据框列表。

  • I have added a rank column in the state AL, which ranks all the hospitals in that particular state, and similarly (using a for-loop) I would add a rank variable to all the data frames in the list.
  • 我在状态AL中添加了一个排名列,它对该特定状态下的所有医院进行排名,并且类似地(使用for循环)我将向列表中的所有数据帧添加一个排名变量。

  • I am trying to create a function whereupon giving an outcome (heart attack, heart failure etc) and rank (number) the function would return the name of a hospital and US state which matches the number (rank) entered.
  • 我正在尝试创建一个函数,其中给出结果(心脏病发作,心力衰竭等)和等级(数字)该函数将返回医院和美国州的名称,其匹配输入的数量(等级)。

As mentioned above the second element has rank variable, so I tried to call that element and match the rank specified. I am beginner and I think I am confused between '==' and '='.

如上所述,第二个元素具有rank变量,因此我尝试调用该元素并匹配指定的等级。我是初学者,我认为我在'=='和'='之间感到困惑。

 > outcome_split[[2]][, "hospital name"]["rank"==2]
    character(0)
    > outcome_split[[2]][, "hospital name"]["rank"=7]
    [1] "BIBB MEDICAL CENTER"

I want to return the name of the hospital matching the rank specified, but I am not sure how to do this. As said earlier confused about '==' and '=' because '==' returns character(0) whereas '=' returns the name of the hospital in the second element, but this return not based on the rank variable but the ID value, at place 7, the mentioned hospital is present but it is not ranked 7.

我想返回符合指定等级的医院名称,但我不知道该怎么做。如前所述混淆'=='和'=',因为'=='返回字符(0),而'='返回第二个元素中医院的名称,但这不是基于排名变量而是ID价值,在地点7,所提到的医院存在,但它没有排名7。

> outcome_split[[2]][, c("hospital name","rank")]
                                       hospital name rank
1                        ANDALUSIA REGIONAL HOSPITAL   52
2                          ATHENS-LIMESTONE HOSPITAL    9
3                          ATMORE COMMUNITY HOSPITAL   53
4                        BAPTIST MEDICAL CENTER EAST    2
5                       BAPTIST MEDICAL CENTER SOUTH   46
6                   BAPTIST MEDICAL CENTER-PRINCETON    8
7                                BIBB MEDICAL CENTER   54
8                       BIRMINGHAM VA MEDICAL CENTER   26
9                           *WOOD MEDICAL CENTER   30
10                    BRYAN W WHITFIELD MEM HOSP INC   55

Sample data:

outcome_split <- structure(list(AK = structure(list(`hospital name` = c("PROVIDENCE ALASKA MEDICAL CENTER", 
"MAT-SU REGIONAL MEDICAL CENTER", "BARTLETT REGIONAL HOSPITAL", 
"FAIRBANKS MEMORIAL HOSPITAL", "ALASKA REGIONAL HOSPITAL", "YUKON KUSKOKWIM DELTA REG HOSPITAL", 
"CENTRAL PENINSULA GENERAL HOSPITAL", "ALASKA NATIVE MEDICAL CENTER", 
"MT EDGECUMBE HOSPITAL", "PROVIDENCE VALDEZ MEDICAL CENTER", 
"PROVIDENCE SEWARD HOSPITAL", "SITKA COMMUNITY HOSPITAL", "PROVIDENCE KODIAK ISLAND MEDICAL CTR", 
"CORDOVA COMMUNITY MEDICAL CENTER", "NORTON SOUND REGIONAL HOSPITAL", 
"PEACEHEALTH KETCHIKAN MEDICAL             CENTER", "SOUTH PENINSULA HOSPITAL"
), state = c("AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", 
"AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK"), `heart attack` = c("13.4", 
"17.7", "Not Available", "15.5", "14.5", "Not Available", "Not Available", 
"15.7", "Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available"), `heart failure` = c("12.4", "11.4", "11.6", 
"15.6", "13.4", "11.2", "11.6", "11.6", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "11.4", "10.8"), pneumonia = c("10.5", "12.1", 
"11.6", "13.4", "12.5", "9.7", "13.8", "15.5", "14.2", "Not Available", 
"Not Available", "11.5", "12.0", "Not Available", "11.6", "11.3", 
"12.2")), .Names = c("hospital name", "state", "heart attack", 
"heart failure", "pneumonia"), row.names = 99:115, class = "data.frame"), 
    AL = structure(list(`hospital name` = c("ANDALUSIA REGIONAL HOSPITAL", 
    "ATHENS-LIMESTONE HOSPITAL", "ATMORE COMMUNITY HOSPITAL", 
    "BAPTIST MEDICAL CENTER EAST", "BAPTIST MEDICAL CENTER SOUTH", 
    "BAPTIST MEDICAL CENTER-PRINCETON", "BIBB MEDICAL CENTER", 
    "BIRMINGHAM VA MEDICAL CENTER", "*WOOD MEDICAL CENTER", 
    "BRYAN W WHITFIELD MEM HOSP INC", "BULLOCK COUNTY HOSPITAL", 
    "CALLAHAN EYE FOUNDATION HOSPITAL", "CHEROKEE MEDICAL CENTER", 
    "CHILTON MEDICAL CENTER", "CITIZENS BAPTIST MEDICAL CENTER", 
    "CLAY COUNTY HOSPITAL", "COMMUNITY HOSPITAL INC", "COOPER GREEN MERCY HOSPITAL", 
    "COOSA VALLEY MEDICAL CENTER", "CRENSHAW COMMUNITY HOSPITAL", 
    "CRESTWOOD MEDICAL CENTER", "CULLMAN REGIONAL MEDICAL CENTER", 
    "D C H REGIONAL MEDICAL CENTER", "D W MCMILLAN MEMORIAL HOSPITAL", 
    "DALE MEDICAL CENTER", "DECATUR GENERAL HOSPITAL", "DEKALB REGIONAL MEDICAL CENTER", 
    "EAST ALABAMA MEDICAL CENTER AND SNF", "ELBA GENERAL HOSPITAL", 
    "ELIZA COFFEE MEMORIAL HOSPITAL", "ELMORE COMMUNITY HOSPITAL", 
    "EVERGREEN MEDICAL CENTER", "FAYETTE MEDICAL CENTER", "FLORALA MEMORIAL HOSPITAL", 
    "FLOWERS HOSPITAL", "GADSDEN REGIONAL MEDICAL CENTER", "GEORGE H. LANIER MEMORIAL HOSPITAL", 
    "GEORGIANA HOSPITAL", "GREENE COUNTY HOSPITAL", "GROVE HILL MEMORIAL HOSPITAL", 
    "HALE COUNTY HOSPITAL", "HELEN KELLER MEMORIAL HOSPITAL", 
    "HIGHLANDS MEDICAL CENTER", "HILL HOSPITAL OF SUMTER COUNTY", 
    "HUNTSVILLE HOSPITAL", "INFIRMARY WEST", "J PAUL JONES HOSPITAL", 
    "JACK HUGHSTON MEMORIAL HOSPITAL", "JACKSON HOSPITAL & CLINIC INC", 
    "JACKSON MEDICAL CENTER", "JACKSONVILLE MEDICAL CENTER", 
    "L V STABLER MEMORIAL HOSPITAL", "LAKE MARTIN COMMUNITY HOSPITAL", 
    "LAKELAND COMMUNITY HOSPITAL", "LAWRENCE MEDICAL CENTER", 
    "MARION REGIONAL MEDICAL CENTER", "MARSHALL MEDICAL CENTER NORTH", 
    "MARSHALL MEDICAL CENTER SOUTH", "MEDICAL CENTER BARBOUR", 
    "MEDICAL CENTER ENTERPRISE", "MEDICAL WEST, AN AFFILIATE OF UAB HEALTH SYSTEM", 
    "MIZELL MEMORIAL HOSPITAL", "MOBILE INFIRMARY", "MONROE COUNTY HOSPITAL", 
    "NORTH BALDWIN INFIRMARY", "NORTHEAST ALABAMA REGIONAL MED CENTER", 
    "NORTHWEST MEDICAL CENTER", "PARKWAY MEDICAL CENTER", "PICKENS COUNTY MEDICAL CENTER", 
    "PRATTVILLE BAPTIST HOSPITAL", "PROVIDENCE HOSPITAL", "RED BAY HOSPITAL", 
    "RIVERVIEW REGIONAL MEDICAL CENTER", "RUSSELL HOSPITAL", 
    "RUSSELLVILLE HOSPITAL", "SHELBY BAPTIST MEDICAL CENTER", 
    "SHOALS HOSPITAL", "SOUTH BALDWIN REGIONAL MEDICAL CENTER", 
    "SOUTHEAST ALABAMA MEDICAL CENTER", "SPRINGHILL MEDICAL CENTER", 
    "ST VINCENT'S BIRMINGHAM", "ST VINCENT'S EAST", "ST VINCENT'S ST CLAIR", 
    "ST VINCENTS BLOUNT", "STRINGFELLOW MEMORIAL HOSPITAL", "THOMAS HOSPITAL", 
    "TRINITY MEDICAL CENTER", "TROY REGIONAL MEDICAL CENTER", 
    "TUSCALOOSA VA MEDICAL CENTER", "UNIV OF S AL CHILDREN'S & WOMEN'S HOS", 
    "UNIV OF SOUTH ALABAMA MEDICAL CENTER", "UNIVERSITY OF ALABAMA HOSPITAL", 
    "VA CENTRAL ALABAMA HEALTHCARE SYSTEM - MONTGOMERY", "VAUGHAN REG MED CENTER PARKWAY CAMPUS", 
    "WALKER BAPTIST MEDICAL CENTER", "WASHINGTON COUNTY HOSPITAL", 
    "WEDOWEE HOSPITAL", "WIREGRASS MEDICAL CENTER"), state = c("AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL"), `heart attack` = c("Not Available", 
    "15.0", "Not Available", "14.2", "17.8", "14.9", "Not Available", 
    "16.1", "16.5", "Not Available", "Not Available", "Not Available", 
    "Not Available", "Not Available", "17.3", "16.7", "17.1", 
    "Not Available", "15.2", "Not Available", "13.3", "17.1", 
    "15.8", "15.7", "17.3", "16.8", "18.0", "16.3", "Not Available", 
    "18.1", "Not Available", "Not Available", "16.7", "Not Available", 
    "15.2", "16.7", "15.4", "14.5", "Not Available", "Not Available", 
    "Not Available", "19.6", "15.0", "Not Available", "15.2", 
    "Not Available", "Not Available", "Not Available", "17.5", 
    "Not Available", "Not Available", "Not Available", "Not Available", 
    "Not Available", "15.6", "Not Available", "Not Available", 
    "18.5", "Not Available", "16.6", "15.3", "Not Available", 
    "19.3", "Not Available", "Not Available", "15.6", "Not Available", 
    "15.8", "Not Available", "14.6", "15.2", "Not Available", 
    "16.9", "17.1", "Not Available", "15.9", "Not Available", 
    "15.8", "14.3", "16.0", "16.2", "17.7", "Not Available", 
    "Not Available", "16.4", "14.7", "16.8", "Not Available", 
    "Not Available", "Not Available", "Not Available", "15.0", 
    "Not Available", "14.7", "17.0", "Not Available", "Not Available", 
    "Not Available"), `heart failure` = c("10.1", "11.7", "10.8", 
    "9.6", "11.8", "11.4", "14.0", "10.4", "13.5", "11.7", "12.3", 
    "Not Available", "12.1", "11.5", "14.9", "12.6", "12.3", 
    "Not Available", "11.7", "13.8", "13.8", "12.1", "11.2", 
    "14.8", "11.8", "10.9", "16.6", "12.9", "Not Available", 
    "11.3", "11.3", "9.1", "11.7", "10.4", "12.0", "10.7", "8.8", 
    "10.8", "11.2", "10.4", "10.7", "12.6", "13.4", "Not Available", 
    "12.4", "12.5", "Not Available", "10.8", "10.2", "12.3", 
    "16.4", "11.1", "10.9", "13.6", "9.9", "11.5", "12.5", "15.2", 
    "13.5", "12.9", "11.4", "13.6", "10.7", "13.0", "11.5", "11.2", 
    "11.8", "10.5", "12.6", "14.8", "13.5", "12.6", "10.8", "11.6", 
    "14.8", "13.6", "13.6", "15.1", "11.4", "10.4", "10.6", "10.9", 
    "10.8", "13.0", "12.0", "12.8", "12.9", "11.2", "Not Available", 
    "Not Available", "12.5", "12.5", "12.2", "12.0", "10.8", 
    "Not Available", "10.4", "10.6"), pneumonia = c("11.1", "12.1", 
    "13.0", "10.2", "14.3", "11.6", "13.6", "11.0", "13.0", "9.1", 
    "12.1", "Not Available", "14.7", "11.2", "12.1", "11.8", 
    "11.6", "Not Available", "11.4", "15.8", "10.4", "12.1", 
    "11.3", "12.6", "9.9", "11.9", "15.8", "12.1", "12.0", "13.4", 
    "11.2", "12.0", "12.9", "12.1", "11.3", "14.6", "10.3", "11.3", 
    "11.5", "12.1", "11.5", "15.0", "12.9", "Not Available", 
    "14.1", "13.1", "11.4", "10.9", "14.7", "9.3", "19.2", "13.0", 
    "10.8", "10.7", "9.8", "10.0", "8.7", "13.9", "15.0", "12.9", 
    "12.1", "14.9", "12.5", "15.6", "14.6", "13.2", "13.1", "11.9", 
    "12.4", "14.2", "10.6", "11.6", "12.7", "14.9", "11.5", "10.7", 
    "12.8", "9.8", "10.9", "13.8", "12.6", "16.2", "11.4", "15.3", 
    "12.0", "13.1", "13.9", "11.1", "Not Available", "Not Available", 
    "Not Available", "12.7", "11.3", "14.0", "11.9", "Not Available", 
    "13.9", "12.3"), rank = c(52L, 9L, 53L, 2L, 46L, 8L, 54L, 
    26L, 30L, 55L, 56L, 57L, 58L, 59L, 42L, 32L, 39L, 60L, 12L, 
    61L, 1L, 40L, 21L, 20L, 43L, 35L, 47L, 28L, 62L, 48L, 63L, 
    64L, 33L, 65L, 13L, 34L, 17L, 4L, 66L, 67L, 68L, 51L, 10L, 
    69L, 14L, 70L, 71L, 72L, 44L, 73L, 74L, 75L, 76L, 77L, 18L, 
    78L, 79L, 49L, 80L, 31L, 16L, 81L, 50L, 82L, 83L, 19L, 84L, 
    22L, 85L, 5L, 15L, 86L, 37L, 41L, 87L, 24L, 88L, 23L, 3L, 
    25L, 27L, 45L, 89L, 90L, 29L, 6L, 36L, 91L, 92L, 93L, 94L, 
    11L, 95L, 7L, 38L, 96L, 97L, 98L)), class = "data.frame", .Names = c("hospital name", 
    "state", "heart attack", "heart failure", "pneumonia", "rank"
    ), row.names = c(NA, -98L))), .Names = c("AK", "AL"))

2 个解决方案

#1


1  

Your rank column is not in order, see below where I arrange by rank.

您的排名列不按顺序排列,请参阅下面按排名排列的位置。

The select'ing is a one-liner with dplyr (or with data.table):

select'ing是一个带有dplyr(或data.table)的单行程序:

require(dplyr)

output_split[[2]] %>% filter(rank == 2) %>% select('hospital name')

                hospital name
1 BAPTIST MEDICAL CENTER EAST

output_split[[2]] %>% filter(rank == '7') %>% select('hospital name')
                      hospital name
1 VAUGHAN REG MED CENTER PARKWAY CAMPUS

# Here's the hospital order when we arrange by 'rank':
output_split[[2]] %>% arrange(rank) %>% select('hospital name', 'rank') %>% head(7)
                          hospital name rank
1              CRESTWOOD MEDICAL CENTER    1
2           BAPTIST MEDICAL CENTER EAST    2
3      SOUTHEAST ALABAMA MEDICAL CENTER    3
4                    GEORGIANA HOSPITAL    4
5           PRATTVILLE BAPTIST HOSPITAL    5
6                       THOMAS HOSPITAL    6
7 VAUGHAN REG MED CENTER PARKWAY CAMPUS    7

# ... and here was your original order
output_split[[2]] %>% select('hospital name', 'rank') %>% head(7)
                     hospital name rank
1      ANDALUSIA REGIONAL HOSPITAL   52
2        ATHENS-LIMESTONE HOSPITAL    9
3        ATMORE COMMUNITY HOSPITAL   53
4      BAPTIST MEDICAL CENTER EAST    2
5     BAPTIST MEDICAL CENTER SOUTH   46
6 BAPTIST MEDICAL CENTER-PRINCETON    8
7              BIBB MEDICAL CENTER   54

By the way, to avoid trouble, use underscores instead of spaces inside column names, then we don't need quotes around 'hospital_name' etc.

顺便说一句,为了避免麻烦,使用下划线而不是列名称中的空格,那么我们不需要围绕'hospital_name'等引号。

names(os[[2]]) <- gsub(' ', '_', names(os[[2]]))) renames them "hospital_name" "state" "heart_attack" "heart_failure" "pneumonia" "rank"

names(os [[2]])< - gsub('','_',names(os [[2]])))将它们重命名为“hospital_name”“state”“heart_attack”“heart_failure”“pneumonia”“rank “

Or you can use make.names() which will mangle any characters other than alphanumeric, underscore and dot. And gsub() if you want finer control.

或者你可以使用make.names()来破坏除字母数字,下划线和点之外的任何字符。和gsub()如果你想要更好的控制。

And you can collapse the list of dfs into one large df:

你可以将dfs列表折叠成一个大df:

output_split[[1]]$rank <- NA
do.call(function(...) rbind(..., make.row.names=F), output_split)

does that. Now your dplyr select is simply %>% filter(state=='AL', rank==2) %>% select('hospital name')

那样做。现在你的dplyr选择只是%>%filter(state =='AL',rank == 2)%>%select('hospital name')

#2


1  

If you want to select rank 2 and 7 from your second list element try:

如果要从第二个列表元素中选择等级2和7,请尝试:

outcome_split[[2]][outcome_split[[2]]$rank == 2, c("hospital name", "rank")]

hospital name rank

医院名称排名

4 BAPTIST MEDICAL CENTER EAST 2

4 BAPTIST MEDICAL CENTER EAST 2

outcome_split[[2]][outcome_split[[2]]$rank == 7, c("hospital name", "rank")]

hospital name rank

医院名称排名

94 VAUGHAN REG MED CENTER PARKWAY CAMPUS 7

94 VAUGHAN REG MED CENRE PARKWAY CAMPUS 7

I recommend collapsing your list to a data.frame as this will make filtering much easier. Try searching for dplyr::bind_rows or do.call("rbind")

我建议将列表折叠到data.frame,因为这会使过滤更容易。尝试搜索dplyr :: bind_rows或do.call(“rbind”)

#1


1  

Your rank column is not in order, see below where I arrange by rank.

您的排名列不按顺序排列,请参阅下面按排名排列的位置。

The select'ing is a one-liner with dplyr (or with data.table):

select'ing是一个带有dplyr(或data.table)的单行程序:

require(dplyr)

output_split[[2]] %>% filter(rank == 2) %>% select('hospital name')

                hospital name
1 BAPTIST MEDICAL CENTER EAST

output_split[[2]] %>% filter(rank == '7') %>% select('hospital name')
                      hospital name
1 VAUGHAN REG MED CENTER PARKWAY CAMPUS

# Here's the hospital order when we arrange by 'rank':
output_split[[2]] %>% arrange(rank) %>% select('hospital name', 'rank') %>% head(7)
                          hospital name rank
1              CRESTWOOD MEDICAL CENTER    1
2           BAPTIST MEDICAL CENTER EAST    2
3      SOUTHEAST ALABAMA MEDICAL CENTER    3
4                    GEORGIANA HOSPITAL    4
5           PRATTVILLE BAPTIST HOSPITAL    5
6                       THOMAS HOSPITAL    6
7 VAUGHAN REG MED CENTER PARKWAY CAMPUS    7

# ... and here was your original order
output_split[[2]] %>% select('hospital name', 'rank') %>% head(7)
                     hospital name rank
1      ANDALUSIA REGIONAL HOSPITAL   52
2        ATHENS-LIMESTONE HOSPITAL    9
3        ATMORE COMMUNITY HOSPITAL   53
4      BAPTIST MEDICAL CENTER EAST    2
5     BAPTIST MEDICAL CENTER SOUTH   46
6 BAPTIST MEDICAL CENTER-PRINCETON    8
7              BIBB MEDICAL CENTER   54

By the way, to avoid trouble, use underscores instead of spaces inside column names, then we don't need quotes around 'hospital_name' etc.

顺便说一句,为了避免麻烦,使用下划线而不是列名称中的空格,那么我们不需要围绕'hospital_name'等引号。

names(os[[2]]) <- gsub(' ', '_', names(os[[2]]))) renames them "hospital_name" "state" "heart_attack" "heart_failure" "pneumonia" "rank"

names(os [[2]])< - gsub('','_',names(os [[2]])))将它们重命名为“hospital_name”“state”“heart_attack”“heart_failure”“pneumonia”“rank “

Or you can use make.names() which will mangle any characters other than alphanumeric, underscore and dot. And gsub() if you want finer control.

或者你可以使用make.names()来破坏除字母数字,下划线和点之外的任何字符。和gsub()如果你想要更好的控制。

And you can collapse the list of dfs into one large df:

你可以将dfs列表折叠成一个大df:

output_split[[1]]$rank <- NA
do.call(function(...) rbind(..., make.row.names=F), output_split)

does that. Now your dplyr select is simply %>% filter(state=='AL', rank==2) %>% select('hospital name')

那样做。现在你的dplyr选择只是%>%filter(state =='AL',rank == 2)%>%select('hospital name')

#2


1  

If you want to select rank 2 and 7 from your second list element try:

如果要从第二个列表元素中选择等级2和7,请尝试:

outcome_split[[2]][outcome_split[[2]]$rank == 2, c("hospital name", "rank")]

hospital name rank

医院名称排名

4 BAPTIST MEDICAL CENTER EAST 2

4 BAPTIST MEDICAL CENTER EAST 2

outcome_split[[2]][outcome_split[[2]]$rank == 7, c("hospital name", "rank")]

hospital name rank

医院名称排名

94 VAUGHAN REG MED CENTER PARKWAY CAMPUS 7

94 VAUGHAN REG MED CENRE PARKWAY CAMPUS 7

I recommend collapsing your list to a data.frame as this will make filtering much easier. Try searching for dplyr::bind_rows or do.call("rbind")

我建议将列表折叠到data.frame,因为这会使过滤更容易。尝试搜索dplyr :: bind_rows或do.call(“rbind”)