从宽格式到长格式

时间:2021-06-16 04:26:40

I have some trouble to convert my data.frame from a wide table to a long table.At the moment it looks like this:

我很难把我的数据从一张宽桌子转换成一张长桌子。现在看起来是这样的:

Code Country        1950    1951    1952    1953    1954AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555ALB  Albania        8,097   8,986   10,058  11,123  12,246

Now I like to transform this data.frame into a long data.frame.Something like this:

现在我想把这个数据。frame,变成一个很长的data。frame,像这样:

Code Country        Year    ValueAFG  Afghanistan    1950    20,249AFG  Afghanistan    1951    21,352AFG  Afghanistan    1952    22,532AFG  Afghanistan    1953    23,557AFG  Afghanistan    1954    24,555ALB  Albania        1950    8,097ALB  Albania        1951    8,986ALB  Albania        1952    10,058ALB  Albania        1953    11,123ALB  Albania        1954    12,246

I have looked and tried it already with the melt() and the reshape() functionsas some people were suggesting to similar questions.However, so far I only get messy results.

我已经对熔融()和重塑()功能进行了研究和尝试,这是一些人对类似问题的建议。然而,到目前为止,我只得到了混乱的结果。

If it is possible I would like to do it with the reshape() function sinceit looks a little bit nicer to handle.

如果有可能的话,我希望使用transform()函数来完成它,因为它看起来更易于处理。

5 个解决方案

#1


55  

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:

reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value",         idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)

#2


71  

Three alternative solutions:

三个替代方案:

1: With reshape2

1:与reshape2

library(reshape2)long <- melt(wide, id.vars = c("Code", "Country"))

giving:

给:

   Code     Country variable  value1   AFG Afghanistan     1950 20,2492   ALB     Albania     1950  8,0973   AFG Afghanistan     1951 21,3524   ALB     Albania     1951  8,9865   AFG Afghanistan     1952 22,5326   ALB     Albania     1952 10,0587   AFG Afghanistan     1953 23,5578   ALB     Albania     1953 11,1239   AFG Afghanistan     1954 24,55510  ALB     Albania     1954 12,246

Some alternative notations that give the same result:

一些可选的符号给出相同的结果:

# you can also define the id-variables by column numbermelt(wide, id.vars = 1:2)# as an alternative you can also specify the measure-variables# all other variables will then be used as id-variablesmelt(wide, measure.vars = 3:7)melt(wide, measure.vars = as.character(1950:1954))

2: With data.table

2:与data.table

You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt from reshape2. You can for exaple also specify the name of the variable-column:

您可以使用与reshape2包中相同的熔体功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:

library(data.table)long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")

Some alternative notations:

一些替代符号:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")melt(setDT(wide), measure.vars = 3:7, variable.name = "year")melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

3: With tidyr

3:与tidyr

library(tidyr)long <- wide %>% gather(year, value, -c(Code, Country))

Some alternative notations:

一些替代符号:

wide %>% gather(year, value, -Code, -Country)wide %>% gather(year, value, -1:-2)wide %>% gather(year, value, -(1:2))wide %>% gather(year, value, -1, -2)wide %>% gather(year, value, 3:7)wide %>% gather(year, value, `1950`:`1954`)

If you want to exclude NA values, you can add na.rm = TRUE to the melt as well as the gather functions.

如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。


Another problem with the data is that the values will be read by R as character-values (as a result of the , in the numbers). You can repair that with gsub and as.numeric:

数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:

long$value <- as.numeric(gsub(",", "", long$value))

Or directly with data.table or dplyr:

或直接与数据。表或dplyr:

# data.tablelong <- melt(setDT(wide),             id.vars = c("Code","Country"),             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]# tidyr and dplyrlong <- wide %>% gather(year, value, -c(Code,Country)) %>%   mutate(value = as.numeric(gsub(",", "", value)))

Data:

数据:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

#3


27  

Using reshape package:

使用改造方案:

#datax <- read.table(textConnection("Code Country        1950    1951    1952    1953    1954AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)library(reshape)x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

#4


4  

Since this answer is tagged with , I felt it would be useful to share another alternative from base R: stack.

由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。

Note, however, that stack does not work with factors--it only works if is.vector is TRUE, and from the documentation for is.vector, we find that:

但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。向量是正确的,从文档中可以看出。向量,我们发现:

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

是多少。如果x是指定模式的向量,则向量返回TRUE。否则,返回FALSE。

I'm using the sample data from @Jaap's answer, where the values in the year columns are factors.

我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。

Here's the stack approach:

这是堆栈的方法:

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))##    Code     Country values  ind## 1   AFG Afghanistan 20,249 1950## 2   ALB     Albania  8,097 1950## 3   AFG Afghanistan 21,352 1951## 4   ALB     Albania  8,986 1951## 5   AFG Afghanistan 22,532 1952## 6   ALB     Albania 10,058 1952## 7   AFG Afghanistan 23,557 1953## 8   ALB     Albania 11,123 1953## 9   AFG Afghanistan 24,555 1954## 10  ALB     Albania 12,246 1954

#5


3  

Here is another example showing the use of gather from tidyr. You can select the columns to gather either by removing them individually (as I do here), or by including the years you want explicitly.

下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。

Note that, to handle the commas (and X's added if check.names = FALSE is not set), I am also using dplyr's mutate with parse_number from readr to convert the text values back to numbers. These are all part of the tidyverse and so can be loaded together with library(tidyverse)

注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)

wide %>%  gather(Year, Value, -Code, -Country) %>%  mutate(Year = parse_number(Year)         , Value = parse_number(Value))

Returns:

返回:

   Code     Country Year Value1   AFG Afghanistan 1950 202492   ALB     Albania 1950  80973   AFG Afghanistan 1951 213524   ALB     Albania 1951  89865   AFG Afghanistan 1952 225326   ALB     Albania 1952 100587   AFG Afghanistan 1953 235578   ALB     Albania 1953 111239   AFG Afghanistan 1954 2455510  ALB     Albania 1954 12246

#1


55  

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:

reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value",         idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)

#2


71  

Three alternative solutions:

三个替代方案:

1: With reshape2

1:与reshape2

library(reshape2)long <- melt(wide, id.vars = c("Code", "Country"))

giving:

给:

   Code     Country variable  value1   AFG Afghanistan     1950 20,2492   ALB     Albania     1950  8,0973   AFG Afghanistan     1951 21,3524   ALB     Albania     1951  8,9865   AFG Afghanistan     1952 22,5326   ALB     Albania     1952 10,0587   AFG Afghanistan     1953 23,5578   ALB     Albania     1953 11,1239   AFG Afghanistan     1954 24,55510  ALB     Albania     1954 12,246

Some alternative notations that give the same result:

一些可选的符号给出相同的结果:

# you can also define the id-variables by column numbermelt(wide, id.vars = 1:2)# as an alternative you can also specify the measure-variables# all other variables will then be used as id-variablesmelt(wide, measure.vars = 3:7)melt(wide, measure.vars = as.character(1950:1954))

2: With data.table

2:与data.table

You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt from reshape2. You can for exaple also specify the name of the variable-column:

您可以使用与reshape2包中相同的熔体功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:

library(data.table)long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")

Some alternative notations:

一些替代符号:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")melt(setDT(wide), measure.vars = 3:7, variable.name = "year")melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

3: With tidyr

3:与tidyr

library(tidyr)long <- wide %>% gather(year, value, -c(Code, Country))

Some alternative notations:

一些替代符号:

wide %>% gather(year, value, -Code, -Country)wide %>% gather(year, value, -1:-2)wide %>% gather(year, value, -(1:2))wide %>% gather(year, value, -1, -2)wide %>% gather(year, value, 3:7)wide %>% gather(year, value, `1950`:`1954`)

If you want to exclude NA values, you can add na.rm = TRUE to the melt as well as the gather functions.

如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。


Another problem with the data is that the values will be read by R as character-values (as a result of the , in the numbers). You can repair that with gsub and as.numeric:

数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:

long$value <- as.numeric(gsub(",", "", long$value))

Or directly with data.table or dplyr:

或直接与数据。表或dplyr:

# data.tablelong <- melt(setDT(wide),             id.vars = c("Code","Country"),             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]# tidyr and dplyrlong <- wide %>% gather(year, value, -c(Code,Country)) %>%   mutate(value = as.numeric(gsub(",", "", value)))

Data:

数据:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

#3


27  

Using reshape package:

使用改造方案:

#datax <- read.table(textConnection("Code Country        1950    1951    1952    1953    1954AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)library(reshape)x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

#4


4  

Since this answer is tagged with , I felt it would be useful to share another alternative from base R: stack.

由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。

Note, however, that stack does not work with factors--it only works if is.vector is TRUE, and from the documentation for is.vector, we find that:

但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。向量是正确的,从文档中可以看出。向量,我们发现:

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

是多少。如果x是指定模式的向量,则向量返回TRUE。否则,返回FALSE。

I'm using the sample data from @Jaap's answer, where the values in the year columns are factors.

我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。

Here's the stack approach:

这是堆栈的方法:

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))##    Code     Country values  ind## 1   AFG Afghanistan 20,249 1950## 2   ALB     Albania  8,097 1950## 3   AFG Afghanistan 21,352 1951## 4   ALB     Albania  8,986 1951## 5   AFG Afghanistan 22,532 1952## 6   ALB     Albania 10,058 1952## 7   AFG Afghanistan 23,557 1953## 8   ALB     Albania 11,123 1953## 9   AFG Afghanistan 24,555 1954## 10  ALB     Albania 12,246 1954

#5


3  

Here is another example showing the use of gather from tidyr. You can select the columns to gather either by removing them individually (as I do here), or by including the years you want explicitly.

下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。

Note that, to handle the commas (and X's added if check.names = FALSE is not set), I am also using dplyr's mutate with parse_number from readr to convert the text values back to numbers. These are all part of the tidyverse and so can be loaded together with library(tidyverse)

注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)

wide %>%  gather(Year, Value, -Code, -Country) %>%  mutate(Year = parse_number(Year)         , Value = parse_number(Value))

Returns:

返回:

   Code     Country Year Value1   AFG Afghanistan 1950 202492   ALB     Albania 1950  80973   AFG Afghanistan 1951 213524   ALB     Albania 1951  89865   AFG Afghanistan 1952 225326   ALB     Albania 1952 100587   AFG Afghanistan 1953 235578   ALB     Albania 1953 111239   AFG Afghanistan 1954 2455510  ALB     Albania 1954 12246

相关文章