使用CASE和GROUP BY进行数据透视的动态替代方法

时间:2022-07-01 22:56:38

I have a table that looks like this:

我有一个看起来像这样的表:

id    feh    bar
1     10     A
2     20     A
3      3     B
4      4     B
5      5     C
6      6     D
7      7     D
8      8     D

And I want it to look like this:

我希望它看起来像这样:

bar  val1   val2   val3
A     10     20 
B      3      4 
C      5        
D      6      7     8

I have this query that does this:

我有这样的查询:

SELECT bar, 
   MAX(CASE WHEN abc."row" = 1 THEN feh ELSE NULL END) AS "val1",
   MAX(CASE WHEN abc."row" = 2 THEN feh ELSE NULL END) AS "val2",
   MAX(CASE WHEN abc."row" = 3 THEN feh ELSE NULL END) AS "val3"
FROM
(
  SELECT bar, feh, row_number() OVER (partition by bar) as row
  FROM "Foo"
 ) abc
GROUP BY bar

This is a very make-shifty approach and gets unwieldy if there are a lot of new columns to be created. I was wondering if the CASE statements can be made better to make this query more dynamic? Also, I'd love to see other approaches to doing this.

这是一种非常狡猾的方法,如果要创建许多新列,它会变得难以处理。我想知道CASE语句是否可以更好地使这个查询更具动态性?此外,我很乐意看到其他方法来做到这一点。

5 个解决方案

#1


46  

If you have not installed the additional module tablefunc, run this command once per database:

如果尚未安装附加模块tablefunc,请对每个数据库运行此命令:

CREATE EXTENSION tablefunc;

Answer to question

A very basic crosstab solution for your case:

适合您案例的非常基本的交叉表解决方案:

SELECT * FROM crosstab(
  'SELECT bar, 1 AS cat, feh
   FROM   tbl_org
   ORDER  BY bar, feh')
 AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

The special difficulty here is, that there is no category (cat) in the base table. For the basic 1-parameter form we can just provide a dummy column with a dummy value serving as category. The value is ignored anyway.

这里的特殊困难是,基表中没有类别(cat)。对于基本的1-参数形式,我们可以提供一个虚拟列,其虚拟值用作类别。无论如何都会忽略该值。

This is one of the rare cases where the second parameter for the crosstab() function is not needed, because all NULL values only appear in dangling columns to the right by definition of this problem. And the order can be determined by the value.

这是少见的情况之一,其中不需要交叉表()函数的第二个参数,因为所有NULL值仅出现在此问题的定义中右侧的悬空列中。订单可以由价值决定。

If we had an actual category column with names determining the order of values in the result, we'd need the 2-parameter form of crosstab(). Here I synthesize a category column with the help of the window function row_number(), to base crosstab() on:

如果我们有一个实际的类别列,其名称确定结果中值的顺序,我们需要使用交叉表()的2参数形式。在这里,我借助窗口函数row_number()合成一个类别列,以基于交叉表():

SELECT * FROM crosstab(
   $$
   SELECT bar, val, feh
   FROM  (
      SELECT *, 'val' || row_number() OVER (PARTITION BY bar ORDER BY feh) AS val
      FROM tbl_org
      ) x
   ORDER BY 1, 2
   $$
 , $$VALUES ('val1'), ('val2'), ('val3')$$         -- more columns?
) AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

The rest is pretty much run-of-the-mill. Find more explanation and links in these closely related answers.

其余的几乎都是普通的。在这些密切相关的答案中找到更多解释和链接。

Basics:
Read this first if you are not familiar with the crosstab() function!

基础知识:如果您不熟悉交叉表()函数,请先阅读此内容!

Advanced:

高级:

Proper test setup

That's how you should provide a test case to begin with:

这就是你应该如何提供一个测试用例:

CREATE TEMP TABLE tbl_org (id int, feh int, bar text);
INSERT INTO tbl_org (id, feh, bar) VALUES
   (1, 10, 'A')
 , (2, 20, 'A')
 , (3,  3, 'B')
 , (4,  4, 'B')
 , (5,  5, 'C')
 , (6,  6, 'D')
 , (7,  7, 'D')
 , (8,  8, 'D');

Dynamic crosstab?

Not very dynamic, yet, as @Clodoaldo commented. Dynamic return types are hard to achieve with plpgsql. But there are ways around it - with some limitations.

然而,正如@Clodoaldo评论的那样,还不是很有活力。使用plpgsql很难实现动态返回类型。但是有很多方法 - 有一些限制。

So not to further complicate the rest, I demonstrate with a simpler test case:

所以不要让其余部分进一步复杂化,我用一个更简单的测试用例来证明:

CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
   ('A', 'val1', 10)
 , ('A', 'val2', 20)
 , ('B', 'val1', 3)
 , ('B', 'val2', 4)
 , ('C', 'val1', 5)
 , ('D', 'val3', 8)
 , ('D', 'val1', 6)
 , ('D', 'val2', 7);

Call:

呼叫:

SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2')
AS ct (row_name text, val1 int, val2 int, val3 int);

Returns:

返回:

 row_name | val1 | val2 | val3
----------+------+------+------
 A        | 10   | 20   |
 B        |  3   |  4   |
 C        |  5   |      |
 D        |  6   |  7   |  8

Built-in feature of tablefunc module

The tablefunc module provides a simple infrastructure for generic crosstab() calls without providing a column definition list. A number of functions written in C (typically very fast):

tablefunc模块为通用的crosstab()调用提供了一个简单的基础结构,而不提供列定义列表。用C编写的许多函数(通常非常快):

crosstabN()

crosstab1() - crosstab4() are pre-defined. One minor point: they require and return all text. So we need to cast our integer values. But it simplifies the call:

crosstab1() - crosstab4()是预定义的。一个小问题:他们需要并返回所有文本。所以我们需要转换整数值。但它简化了通话:

SELECT * FROM crosstab4('SELECT row_name, attrib, val::text  -- cast!
                         FROM tbl ORDER BY 1,2')

Result:

结果:

 row_name | category_1 | category_2 | category_3 | category_4
----------+------------+------------+------------+------------
 A        | 10         | 20         |            |
 B        | 3          | 4          |            |
 C        | 5          |            |            |
 D        | 6          | 7          | 8          |

Custom crosstab() function

For more columns or other data types, we create our own composite type and function (once).
Type:

对于更多列或其他数据类型,我们创建自己的复合类型和函数(一次)。类型:

CREATE TYPE tablefunc_crosstab_int_5 AS (
  row_name text, val1 int, val2 int, val3 int, val4 int, val5 int);

Function:

功能:

CREATE OR REPLACE FUNCTION crosstab_int_5(text)
  RETURNS SETOF tablefunc_crosstab_int_5
AS '$libdir/tablefunc', 'crosstab' LANGUAGE c STABLE STRICT;

Call:

呼叫:

SELECT * FROM crosstab_int_5('SELECT row_name, attrib, val   -- no cast!
                              FROM tbl ORDER BY 1,2');

Result:

结果:

 row_name | val1 | val2 | val3 | val4 | val5
----------+------+------+------+------+------
 A        |   10 |   20 |      |      |
 B        |    3 |    4 |      |      |
 C        |    5 |      |      |      |
 D        |    6 |    7 |    8 |      |

One polymorphic, dynamic function for all

This goes beyond what's covered by the tablefunc module.
To make the return type dynamic I use a polymorphic type with a technique detailed in this related answer:

这超出了tablefunc模块所涵盖的范围。为了使返回类型动态,我使用多态类型,并在此相关答案中详细说明了该技术:

1-parameter form:

1参数形式:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _rowtype anyelement)
  RETURNS SETOF anyelement AS
$func$
BEGIN
   RETURN QUERY EXECUTE 
   (SELECT format('SELECT * FROM crosstab(%L) t(%s)'
                , _qry
                , string_agg(quote_ident(attname) || ' ' || atttypid::regtype
                           , ', ' ORDER BY attnum))
    FROM   pg_attribute
    WHERE  attrelid = pg_typeof(_rowtype)::text::regclass
    AND    attnum > 0
    AND    NOT attisdropped);
END
$func$  LANGUAGE plpgsql;

Overload with this variant for the 2-parameter form:

使用此变量为2参数形式重载:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _cat_qry text, _rowtype anyelement)
  RETURNS SETOF anyelement AS
$func$
BEGIN
   RETURN QUERY EXECUTE 
   (SELECT format('SELECT * FROM crosstab(%L, %L) t(%s)'
                , _qry, _cat_qry
                , string_agg(quote_ident(attname) || ' ' || atttypid::regtype
                           , ', ' ORDER BY attnum))
    FROM   pg_attribute
    WHERE  attrelid = pg_typeof(_rowtype)::text::regclass
    AND    attnum > 0
    AND    NOT attisdropped);
END
$func$  LANGUAGE plpgsql;

pg_typeof(_rowtype)::text::regclass: There is a row type defined for every user-defined composite type, so that attributes (columns) are listed in the system catalog pg_attribute. The fast lane to get it: cast the registered type (regtype) to text and cast this text to regclass.

pg_typeof(_rowtype):: text :: regclass:为每个用户定义的复合类型定义了一个行类型,因此属性(列)列在系统目录pg_attribute中。获取它的快车道:将注册类型(regtype)转换为文本并将此文本转换为regclass。

Create composite types once:

You need to define once every return type you are going to use:

您需要为要使用的每种返回类型定义一次:

CREATE TYPE tablefunc_crosstab_int_3 AS (
    row_name text, val1 int, val2 int, val3 int);

CREATE TYPE tablefunc_crosstab_int_4 AS (
    row_name text, val1 int, val2 int, val3 int, val4 int);

...

For ad-hoc calls, you can also just create a temporary table to the same (temporary) effect:

对于临时调用,您还可以创建一个临时表到相同(临时)效果:

CREATE TEMP TABLE temp_xtype7 AS (
    row_name text, x1 int, x2 int, x3 int, x4 int, x5 int, x6 int, x7 int);

Or use the type of an existing table, view or materialized view if available.

或者使用现有表格,视图或物化视图的类型(如果可用)。

Call

Using above row types:

使用上面的行类型:

1-parameter form (no missing values):

1参数形式(无缺失值):

SELECT * FROM crosstab_n(
   'SELECT row_name, attrib, val FROM tbl ORDER BY 1,2'
 , NULL::tablefunc_crosstab_int_3);

2-parameter form (some values can be missing):

2参数形式(某些值可能丢失):

SELECT * FROM crosstab_n(
   'SELECT row_name, attrib, val FROM tbl ORDER BY 1'
 , $$VALUES ('val1'), ('val2'), ('val3')$$
 , NULL::tablefunc_crosstab_int_3);

This one function works for all return types, while the crosstabN() framework provided by the tablefunc module needs a separate function for each.
If you have named your types in sequence like demonstrated above, you only have to replace the bold number. To find the maximum number of categories in the base table:

这一个函数适用于所有返回类型,而tablefunc模块提供的crosstabN()框架需要为每个类型提供单独的函数。如果您按顺序命名了类型,如上所示,您只需要替换粗体数字。要查找基表中的最大类别数:

SELECT max(count(*)) OVER () FROM tbl  -- returns 3
GROUP  BY row_name
LIMIT  1;

That's about as dynamic as this gets if you want individual columns. Arrays like demonstrated by @Clocoaldo or a simple text representation or the result wrapped in a document type like json or hstore can work for any number of categories dynamically.

如果您想要单独的列,这就像动态一样。 @Clocoaldo演示的数组或简单的文本表示形式或包含在json或hstore等文档类型中的结果可以动态地适用于任意数量的类别。

Disclaimer:
It's always potentially dangerous when user input is converted to code. Make sure this cannot be used for SQL injection. Don't accept input from untrusted users (directly).

免责声明:当用户输入转换为代码时,它总是有潜在危险。确保这不能用于SQL注入。不接受来自不受信任的用户的输入(直接)。

Call for original question:

SELECT * FROM crosstab_n('SELECT bar, 1, feh FROM tbl_org ORDER BY 1,2'
                       , NULL::tablefunc_crosstab_int_3);

#2


10  

Although this is an old question, I would like to add another solution made possible by recent improvements in PostgreSQL. This solution achieves the same goal of returning a structured result from a dynamic data set without using the crosstab function at all. In other words, this is a good example of re-examining unintentional and implicit assumptions that prevent us from discovering new solutions to old problems. ;)

虽然这是一个老问题,但我希望通过PostgreSQL的最新改进添加另一个解决方案。该解决方案实现了相同的目标,即在不使用交叉表功能的情况下从动态数据集返回结构化结果。换句话说,这是重新审视无意和隐含假设的一个很好的例子,这些假设阻止我们发现旧问题的新解决方案。 ;)

To illustrate, you asked for a method to transpose data with the following structure:

为了说明,您要求使用以下结构转置数据的方法:

id    feh    bar
1     10     A
2     20     A
3      3     B
4      4     B
5      5     C
6      6     D
7      7     D
8      8     D

into this format:

进入这种格式:

bar  val1   val2   val3
A     10     20 
B      3      4 
C      5        
D      6      7     8

The conventional solution is a clever (and incredibly knowledgeable) approach to creating dynamic crosstab queries that is explained in exquisite detail in Erwin Brandstetter's answer.

传统的解决方案是创建动态交叉表查询的一种聪明(且难以置信的知识)方法,在Erwin Brandstetter的答案中详细解释了这一点。

However, if your particular use case is flexible enough to accept a slightly different result format, then another solution is possible that handles dynamic pivots beautifully. This technique, which I learned of here

但是,如果您的特定用例足够灵活,可以接受稍微不同的结果格式,那么另一种解决方案可以很好地处理动态枢轴。这种技术,我在这里学到了

uses PostgreSQL's new jsonb_object_agg function to construct pivoted data on the fly in the form of a JSON object.

使用PostgreSQL的新jsonb_object_agg函数以JSON对象的形式即时构建数据透视数据。

I will use Mr. Brandstetter's "simpler test case" to illustrate:

我将使用Brandstetter先生的“更简单的测试案例”来说明:

CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
   ('A', 'val1', 10)
 , ('A', 'val2', 20)
 , ('B', 'val1', 3)
 , ('B', 'val2', 4)
 , ('C', 'val1', 5)
 , ('D', 'val3', 8)
 , ('D', 'val1', 6)
 , ('D', 'val2', 7);

Using the jsonb_object_agg function, we can create the required pivoted result set with this pithy beauty:

使用jsonb_object_agg函数,我们可以使用这种精湛的美感创建所需的旋转结果集:

SELECT
  row_name AS bar,
  json_object_agg(attrib, val) AS data
FROM tbl
GROUP BY row_name
ORDER BY row_name;

Which outputs:

哪个输出:

 bar |                  data                  
-----+----------------------------------------
 A   | { "val1" : 10, "val2" : 20 }
 B   | { "val1" : 3, "val2" : 4 }
 C   | { "val1" : 5 }
 D   | { "val3" : 8, "val1" : 6, "val2" : 7 }

As you can see, this function works by creating key/value pairs in the JSON object from the attrib and value columns in the sample data, all grouped by row_name.

如您所见,此函数的工作原理是通过样本数据中的attrib和value列在JSON对象中创建键/值对,所有这些对按row_name分组。

Although this result set obviously looks different, I believe it will actually satisfy many (if not most) real world use cases, especially those where the data requires a dynamically-generated pivot, or where resulting data is consumed by a parent application (e.g., needs to be re-formatted for transmission in a http response).

虽然这个结果集显然看起来不同,但我相信它实际上会满足许多(如果不是大多数)现实世界的用例,特别是那些数据需要动态生成的数据包,或者父应用程序使用结果数据的情况(例如,需要重新格式化以便在http响应中传输)。

Benefits of this approach:

这种方法的好处:

  • Cleaner syntax. I think everyone would agree that the syntax for this approach is far cleaner and easier to understand than even the most basic crosstab examples.

    更清晰的语法。我想每个人都会同意这种方法的语法比最基本的交叉表示例更清晰,更容易理解。

  • Completely dynamic. No information about the underlying data need be specified beforehand. Neither the column names nor their data types need be known ahead of time.

    完全动态。不需要事先指定有关基础数据的信息。无需提前知道列名称及其数据类型。

  • Handles large numbers of columns. Since the pivoted data is saved as a single jsonb column, you will not run up against PostgreSQL's column limit (≤1,600 columns, I believe). There is still a limit, but I believe it is the same as for text fields: 1 GB per JSON object created (please correct me if I am wrong). That's a lot of key/value pairs!

    处理大量列。由于透视数据保存为单个jsonb列,因此您不会遇到PostgreSQL的列限制(我相信≤1,600列)。仍有一个限制,但我认为它与文本字段相同:每个JSON对象创建1 GB(如果我错了,请纠正我)。这是很多关键/价值对!

  • Simplified data handling. I believe that the creation of JSON data in the DB will simplify (and likely speed up) the data conversion process in parent applications. (You will note that the integer data in our sample test case was correctly stored as such in the resulting JSON objects. PostgreSQL handles this by automatically converting its intrinsic data types to JSON in accordance with the JSON specification.) This will effectively eliminate the need to manually cast data passed to parent applications: it can all be delegated to the application's native JSON parser.

    简化数据处理。我相信在数据库中创建JSON数据将简化(并可能加速)父应用程序中的数据转换过程。 (您将注意到我们的示例测试用例中的整数数据已正确存储在生成的JSON对象中.PostgreSQL通过根据JSON规范自动将其内部数据类型转换为JSON来处理此问题。)这将有效地消除需求手动转换传递给父应用程序的数据:它们都可以委托给应用程序的本机JSON解析器。

Differences (and possible drawbacks):

差异(和可能的缺点):

  • It looks different. There's no denying that the results of this approach look different. The JSON object is not as pretty as the crosstab result set; however, the differences are purely cosmetic. The same information is produced--and in a format that is probably more friendly for consumption by parent applications.

    它看起来不同。不可否认,这种方法的结果看起来不同。 JSON对象不如交叉表结果集漂亮;然而,差异纯粹是装饰性的。生成相同的信息 - 并且格式可能更适合父应用程序使用。

  • Missing keys. Missing values in the crosstab approach are filled in with nulls, while the JSON objects are simply missing the applicable keys. You will have to decide for your self if this is an acceptable trade off for your use case. It seems to me that any attempt to address this problem in PostgreSQL will greatly complicate the process and likely involve some introspection in the form of additional queries.

    缺少钥匙。交叉表方法中缺少的值用空值填充,而JSON对象只是缺少适用的键。如果这是您的用例可接受的权衡,您将不得不自己决定。在我看来,在PostgreSQL中解决这个问题的任何尝试都会使这个过程变得非常复杂,并且可能会以其他查询的形式进行一些内省。

  • Key order is not preserved. I don't know if this can be addressed in PostgreSQL, but this issue is mostly cosmetic also, since any parent applications are either unlikely to rely on key order, or have the ability to determine proper key order by other means. The worst case will probably only require an addition query of the database.

    密钥订单不会保留。我不知道这是否可以在PostgreSQL中解决,但这个问题主要是装饰性的,因为任何父应用程序要么不太可能依赖于密钥顺序,要么能够通过其他方式确定正确的密钥顺序。最坏的情况可能只需要对数据库进行额外查询。

Conclusion

结论

I am very curious to hear the opinions of others (especially @ErwinBrandstetter's) on this approach, especially as it pertains to performance. When I discovered this approach on Andrew Bender's blog, it was like getting hit in the side of the head. What a beautiful way to take a fresh approach to a difficult problem in PostrgeSQL. It solved my use case perfectly, and I believe it will likewise serve many others as well.

我很好奇听到其他人(特别是@ ErwinBrandstetter)对这种方法的看法,特别是因为它与性能有关。当我在Andrew Bender的博客上发现这种方法时,就好像被击中头部一样。对PostrgeSQL中的难题采取新方法是多么美妙的方式。它完美地解决了我的用例,我相信它也会同样服务于其他许多用户。

#3


4  

In your case I guess an array is good. SQL Fiddle

在你的情况下,我猜一个阵列是好的。 SQL小提琴

select
    bar,
    feh || array_fill(null::int, array[c - array_length(feh, 1)]) feh
from
    (
        select bar, array_agg(feh) feh
        from foo
        group by bar
    ) s
    cross join (
        select count(*)::int c
        from foo
        group by bar
        order by c desc limit 1
    ) c(c)
;
 bar |      feh      
-----+---------------
 A   | {10,20,NULL}
 B   | {3,4,NULL}
 C   | {5,NULL,NULL}
 D   | {6,7,8}

#4


4  

This is to complete @Damian good answer. I have already suggested the JSON approach in other answers before the 9.6's handy json_object_agg function. It just takes more work with the previous tool set.

这是完成@Damian的好答案。在9.6的方便的json_object_agg函数之前,我已经在其他答案中提出了JSON方法。使用以前的工具集需要更多的工作。

Two of the cited possible drawbacks are really not. The random key order is trivially corrected if necessary. The missing keys, if relevant, takes an almost trivial amount of code to be addressed:

引用的两个可能的缺点实际上并非如此。如有必要,可以轻松纠正随机密钥顺序。丢失的密钥(如果相关)需要处理几乎微不足道的代码:

select
    row_name as bar,
    json_object_agg(attrib, val order by attrib) as data
from
    tbl
    right join
    (
        (select distinct row_name from tbl) a
        cross join
        (select distinct attrib from tbl) b
    ) c using (row_name, attrib)
group by row_name
order by row_name
;
 bar |                     data                     
-----+----------------------------------------------
 a   | { "val1" : 10, "val2" : 20, "val3" : null }
 b   | { "val1" : 3, "val2" : 4, "val3" : null }
 c   | { "val1" : 5, "val2" : null, "val3" : null }
 d   | { "val1" : 6, "val2" : 7, "val3" : 8 }

For a final query consumer which understands JSON there are no drawbacks. The only one is that it can not be consumed as a table source.

对于理解JSON的最终查询使用者而言,没有任何缺点。唯一的一个是它不能作为表源使用。

#5


1  

I'm sorry about returning in the past, but the solution "Dynamic Crosstab" returns erroneous result table. Thus, the valN values are erroneously "aligned to the left" and they don't correspond to the column names. When the input table has "holes" in the values, e.g. "C" has val1 and val3 but not val2. This produces an error: val3 value will be ranged in the column val2 (i.e. the next free column) in the final table.

我很遗憾回到过去,但解决方案“Dynamic Crosstab”会返回错误的结果表。因此,valN值错误地“向左对齐”,并且它们不对应于列名。当输入表中的值中有“空洞”时,例如“C”具有val1和val3但不具有val2。这会产生错误:val3值将在最终表中的列val2(即下一个空闲列)中进行调整。

CREATE TEMP TABLE tbl (row_name text, attrib text, val int); 
INSERT INTO tbl (row_name, attrib, val) VALUES ('C', 'val1', 5) ('C', 'val3', 7);

SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl 
ORDER BY 1,2') AS ct (row_name text, val1 int, val2 int, val3 int);

row_name|val1|val2|val3
 C      |   5|  7 |

In order to return correct cells with "holes" in the right column, the crosstab query requires a 2nd SELECT in the crosstab, something like this "crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2', 'select distinct row_name from tbl order by 1')"

为了在右列中返回带有“孔”的正确单元格,交叉表查询需要交叉表中的第二个SELECT,类似于“交叉表('SELECT row_name,attrib,val FROM tbl ORDER BY 1,2','select不同的row_name从tbl顺序乘1')“

#1


46  

If you have not installed the additional module tablefunc, run this command once per database:

如果尚未安装附加模块tablefunc,请对每个数据库运行此命令:

CREATE EXTENSION tablefunc;

Answer to question

A very basic crosstab solution for your case:

适合您案例的非常基本的交叉表解决方案:

SELECT * FROM crosstab(
  'SELECT bar, 1 AS cat, feh
   FROM   tbl_org
   ORDER  BY bar, feh')
 AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

The special difficulty here is, that there is no category (cat) in the base table. For the basic 1-parameter form we can just provide a dummy column with a dummy value serving as category. The value is ignored anyway.

这里的特殊困难是,基表中没有类别(cat)。对于基本的1-参数形式,我们可以提供一个虚拟列,其虚拟值用作类别。无论如何都会忽略该值。

This is one of the rare cases where the second parameter for the crosstab() function is not needed, because all NULL values only appear in dangling columns to the right by definition of this problem. And the order can be determined by the value.

这是少见的情况之一,其中不需要交叉表()函数的第二个参数,因为所有NULL值仅出现在此问题的定义中右侧的悬空列中。订单可以由价值决定。

If we had an actual category column with names determining the order of values in the result, we'd need the 2-parameter form of crosstab(). Here I synthesize a category column with the help of the window function row_number(), to base crosstab() on:

如果我们有一个实际的类别列,其名称确定结果中值的顺序,我们需要使用交叉表()的2参数形式。在这里,我借助窗口函数row_number()合成一个类别列,以基于交叉表():

SELECT * FROM crosstab(
   $$
   SELECT bar, val, feh
   FROM  (
      SELECT *, 'val' || row_number() OVER (PARTITION BY bar ORDER BY feh) AS val
      FROM tbl_org
      ) x
   ORDER BY 1, 2
   $$
 , $$VALUES ('val1'), ('val2'), ('val3')$$         -- more columns?
) AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

The rest is pretty much run-of-the-mill. Find more explanation and links in these closely related answers.

其余的几乎都是普通的。在这些密切相关的答案中找到更多解释和链接。

Basics:
Read this first if you are not familiar with the crosstab() function!

基础知识:如果您不熟悉交叉表()函数,请先阅读此内容!

Advanced:

高级:

Proper test setup

That's how you should provide a test case to begin with:

这就是你应该如何提供一个测试用例:

CREATE TEMP TABLE tbl_org (id int, feh int, bar text);
INSERT INTO tbl_org (id, feh, bar) VALUES
   (1, 10, 'A')
 , (2, 20, 'A')
 , (3,  3, 'B')
 , (4,  4, 'B')
 , (5,  5, 'C')
 , (6,  6, 'D')
 , (7,  7, 'D')
 , (8,  8, 'D');

Dynamic crosstab?

Not very dynamic, yet, as @Clodoaldo commented. Dynamic return types are hard to achieve with plpgsql. But there are ways around it - with some limitations.

然而,正如@Clodoaldo评论的那样,还不是很有活力。使用plpgsql很难实现动态返回类型。但是有很多方法 - 有一些限制。

So not to further complicate the rest, I demonstrate with a simpler test case:

所以不要让其余部分进一步复杂化,我用一个更简单的测试用例来证明:

CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
   ('A', 'val1', 10)
 , ('A', 'val2', 20)
 , ('B', 'val1', 3)
 , ('B', 'val2', 4)
 , ('C', 'val1', 5)
 , ('D', 'val3', 8)
 , ('D', 'val1', 6)
 , ('D', 'val2', 7);

Call:

呼叫:

SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2')
AS ct (row_name text, val1 int, val2 int, val3 int);

Returns:

返回:

 row_name | val1 | val2 | val3
----------+------+------+------
 A        | 10   | 20   |
 B        |  3   |  4   |
 C        |  5   |      |
 D        |  6   |  7   |  8

Built-in feature of tablefunc module

The tablefunc module provides a simple infrastructure for generic crosstab() calls without providing a column definition list. A number of functions written in C (typically very fast):

tablefunc模块为通用的crosstab()调用提供了一个简单的基础结构,而不提供列定义列表。用C编写的许多函数(通常非常快):

crosstabN()

crosstab1() - crosstab4() are pre-defined. One minor point: they require and return all text. So we need to cast our integer values. But it simplifies the call:

crosstab1() - crosstab4()是预定义的。一个小问题:他们需要并返回所有文本。所以我们需要转换整数值。但它简化了通话:

SELECT * FROM crosstab4('SELECT row_name, attrib, val::text  -- cast!
                         FROM tbl ORDER BY 1,2')

Result:

结果:

 row_name | category_1 | category_2 | category_3 | category_4
----------+------------+------------+------------+------------
 A        | 10         | 20         |            |
 B        | 3          | 4          |            |
 C        | 5          |            |            |
 D        | 6          | 7          | 8          |

Custom crosstab() function

For more columns or other data types, we create our own composite type and function (once).
Type:

对于更多列或其他数据类型,我们创建自己的复合类型和函数(一次)。类型:

CREATE TYPE tablefunc_crosstab_int_5 AS (
  row_name text, val1 int, val2 int, val3 int, val4 int, val5 int);

Function:

功能:

CREATE OR REPLACE FUNCTION crosstab_int_5(text)
  RETURNS SETOF tablefunc_crosstab_int_5
AS '$libdir/tablefunc', 'crosstab' LANGUAGE c STABLE STRICT;

Call:

呼叫:

SELECT * FROM crosstab_int_5('SELECT row_name, attrib, val   -- no cast!
                              FROM tbl ORDER BY 1,2');

Result:

结果:

 row_name | val1 | val2 | val3 | val4 | val5
----------+------+------+------+------+------
 A        |   10 |   20 |      |      |
 B        |    3 |    4 |      |      |
 C        |    5 |      |      |      |
 D        |    6 |    7 |    8 |      |

One polymorphic, dynamic function for all

This goes beyond what's covered by the tablefunc module.
To make the return type dynamic I use a polymorphic type with a technique detailed in this related answer:

这超出了tablefunc模块所涵盖的范围。为了使返回类型动态,我使用多态类型,并在此相关答案中详细说明了该技术:

1-parameter form:

1参数形式:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _rowtype anyelement)
  RETURNS SETOF anyelement AS
$func$
BEGIN
   RETURN QUERY EXECUTE 
   (SELECT format('SELECT * FROM crosstab(%L) t(%s)'
                , _qry
                , string_agg(quote_ident(attname) || ' ' || atttypid::regtype
                           , ', ' ORDER BY attnum))
    FROM   pg_attribute
    WHERE  attrelid = pg_typeof(_rowtype)::text::regclass
    AND    attnum > 0
    AND    NOT attisdropped);
END
$func$  LANGUAGE plpgsql;

Overload with this variant for the 2-parameter form:

使用此变量为2参数形式重载:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _cat_qry text, _rowtype anyelement)
  RETURNS SETOF anyelement AS
$func$
BEGIN
   RETURN QUERY EXECUTE 
   (SELECT format('SELECT * FROM crosstab(%L, %L) t(%s)'
                , _qry, _cat_qry
                , string_agg(quote_ident(attname) || ' ' || atttypid::regtype
                           , ', ' ORDER BY attnum))
    FROM   pg_attribute
    WHERE  attrelid = pg_typeof(_rowtype)::text::regclass
    AND    attnum > 0
    AND    NOT attisdropped);
END
$func$  LANGUAGE plpgsql;

pg_typeof(_rowtype)::text::regclass: There is a row type defined for every user-defined composite type, so that attributes (columns) are listed in the system catalog pg_attribute. The fast lane to get it: cast the registered type (regtype) to text and cast this text to regclass.

pg_typeof(_rowtype):: text :: regclass:为每个用户定义的复合类型定义了一个行类型,因此属性(列)列在系统目录pg_attribute中。获取它的快车道:将注册类型(regtype)转换为文本并将此文本转换为regclass。

Create composite types once:

You need to define once every return type you are going to use:

您需要为要使用的每种返回类型定义一次:

CREATE TYPE tablefunc_crosstab_int_3 AS (
    row_name text, val1 int, val2 int, val3 int);

CREATE TYPE tablefunc_crosstab_int_4 AS (
    row_name text, val1 int, val2 int, val3 int, val4 int);

...

For ad-hoc calls, you can also just create a temporary table to the same (temporary) effect:

对于临时调用,您还可以创建一个临时表到相同(临时)效果:

CREATE TEMP TABLE temp_xtype7 AS (
    row_name text, x1 int, x2 int, x3 int, x4 int, x5 int, x6 int, x7 int);

Or use the type of an existing table, view or materialized view if available.

或者使用现有表格,视图或物化视图的类型(如果可用)。

Call

Using above row types:

使用上面的行类型:

1-parameter form (no missing values):

1参数形式(无缺失值):

SELECT * FROM crosstab_n(
   'SELECT row_name, attrib, val FROM tbl ORDER BY 1,2'
 , NULL::tablefunc_crosstab_int_3);

2-parameter form (some values can be missing):

2参数形式(某些值可能丢失):

SELECT * FROM crosstab_n(
   'SELECT row_name, attrib, val FROM tbl ORDER BY 1'
 , $$VALUES ('val1'), ('val2'), ('val3')$$
 , NULL::tablefunc_crosstab_int_3);

This one function works for all return types, while the crosstabN() framework provided by the tablefunc module needs a separate function for each.
If you have named your types in sequence like demonstrated above, you only have to replace the bold number. To find the maximum number of categories in the base table:

这一个函数适用于所有返回类型,而tablefunc模块提供的crosstabN()框架需要为每个类型提供单独的函数。如果您按顺序命名了类型,如上所示,您只需要替换粗体数字。要查找基表中的最大类别数:

SELECT max(count(*)) OVER () FROM tbl  -- returns 3
GROUP  BY row_name
LIMIT  1;

That's about as dynamic as this gets if you want individual columns. Arrays like demonstrated by @Clocoaldo or a simple text representation or the result wrapped in a document type like json or hstore can work for any number of categories dynamically.

如果您想要单独的列,这就像动态一样。 @Clocoaldo演示的数组或简单的文本表示形式或包含在json或hstore等文档类型中的结果可以动态地适用于任意数量的类别。

Disclaimer:
It's always potentially dangerous when user input is converted to code. Make sure this cannot be used for SQL injection. Don't accept input from untrusted users (directly).

免责声明:当用户输入转换为代码时,它总是有潜在危险。确保这不能用于SQL注入。不接受来自不受信任的用户的输入(直接)。

Call for original question:

SELECT * FROM crosstab_n('SELECT bar, 1, feh FROM tbl_org ORDER BY 1,2'
                       , NULL::tablefunc_crosstab_int_3);

#2


10  

Although this is an old question, I would like to add another solution made possible by recent improvements in PostgreSQL. This solution achieves the same goal of returning a structured result from a dynamic data set without using the crosstab function at all. In other words, this is a good example of re-examining unintentional and implicit assumptions that prevent us from discovering new solutions to old problems. ;)

虽然这是一个老问题,但我希望通过PostgreSQL的最新改进添加另一个解决方案。该解决方案实现了相同的目标,即在不使用交叉表功能的情况下从动态数据集返回结构化结果。换句话说,这是重新审视无意和隐含假设的一个很好的例子,这些假设阻止我们发现旧问题的新解决方案。 ;)

To illustrate, you asked for a method to transpose data with the following structure:

为了说明,您要求使用以下结构转置数据的方法:

id    feh    bar
1     10     A
2     20     A
3      3     B
4      4     B
5      5     C
6      6     D
7      7     D
8      8     D

into this format:

进入这种格式:

bar  val1   val2   val3
A     10     20 
B      3      4 
C      5        
D      6      7     8

The conventional solution is a clever (and incredibly knowledgeable) approach to creating dynamic crosstab queries that is explained in exquisite detail in Erwin Brandstetter's answer.

传统的解决方案是创建动态交叉表查询的一种聪明(且难以置信的知识)方法,在Erwin Brandstetter的答案中详细解释了这一点。

However, if your particular use case is flexible enough to accept a slightly different result format, then another solution is possible that handles dynamic pivots beautifully. This technique, which I learned of here

但是,如果您的特定用例足够灵活,可以接受稍微不同的结果格式,那么另一种解决方案可以很好地处理动态枢轴。这种技术,我在这里学到了

uses PostgreSQL's new jsonb_object_agg function to construct pivoted data on the fly in the form of a JSON object.

使用PostgreSQL的新jsonb_object_agg函数以JSON对象的形式即时构建数据透视数据。

I will use Mr. Brandstetter's "simpler test case" to illustrate:

我将使用Brandstetter先生的“更简单的测试案例”来说明:

CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
   ('A', 'val1', 10)
 , ('A', 'val2', 20)
 , ('B', 'val1', 3)
 , ('B', 'val2', 4)
 , ('C', 'val1', 5)
 , ('D', 'val3', 8)
 , ('D', 'val1', 6)
 , ('D', 'val2', 7);

Using the jsonb_object_agg function, we can create the required pivoted result set with this pithy beauty:

使用jsonb_object_agg函数,我们可以使用这种精湛的美感创建所需的旋转结果集:

SELECT
  row_name AS bar,
  json_object_agg(attrib, val) AS data
FROM tbl
GROUP BY row_name
ORDER BY row_name;

Which outputs:

哪个输出:

 bar |                  data                  
-----+----------------------------------------
 A   | { "val1" : 10, "val2" : 20 }
 B   | { "val1" : 3, "val2" : 4 }
 C   | { "val1" : 5 }
 D   | { "val3" : 8, "val1" : 6, "val2" : 7 }

As you can see, this function works by creating key/value pairs in the JSON object from the attrib and value columns in the sample data, all grouped by row_name.

如您所见,此函数的工作原理是通过样本数据中的attrib和value列在JSON对象中创建键/值对,所有这些对按row_name分组。

Although this result set obviously looks different, I believe it will actually satisfy many (if not most) real world use cases, especially those where the data requires a dynamically-generated pivot, or where resulting data is consumed by a parent application (e.g., needs to be re-formatted for transmission in a http response).

虽然这个结果集显然看起来不同,但我相信它实际上会满足许多(如果不是大多数)现实世界的用例,特别是那些数据需要动态生成的数据包,或者父应用程序使用结果数据的情况(例如,需要重新格式化以便在http响应中传输)。

Benefits of this approach:

这种方法的好处:

  • Cleaner syntax. I think everyone would agree that the syntax for this approach is far cleaner and easier to understand than even the most basic crosstab examples.

    更清晰的语法。我想每个人都会同意这种方法的语法比最基本的交叉表示例更清晰,更容易理解。

  • Completely dynamic. No information about the underlying data need be specified beforehand. Neither the column names nor their data types need be known ahead of time.

    完全动态。不需要事先指定有关基础数据的信息。无需提前知道列名称及其数据类型。

  • Handles large numbers of columns. Since the pivoted data is saved as a single jsonb column, you will not run up against PostgreSQL's column limit (≤1,600 columns, I believe). There is still a limit, but I believe it is the same as for text fields: 1 GB per JSON object created (please correct me if I am wrong). That's a lot of key/value pairs!

    处理大量列。由于透视数据保存为单个jsonb列,因此您不会遇到PostgreSQL的列限制(我相信≤1,600列)。仍有一个限制,但我认为它与文本字段相同:每个JSON对象创建1 GB(如果我错了,请纠正我)。这是很多关键/价值对!

  • Simplified data handling. I believe that the creation of JSON data in the DB will simplify (and likely speed up) the data conversion process in parent applications. (You will note that the integer data in our sample test case was correctly stored as such in the resulting JSON objects. PostgreSQL handles this by automatically converting its intrinsic data types to JSON in accordance with the JSON specification.) This will effectively eliminate the need to manually cast data passed to parent applications: it can all be delegated to the application's native JSON parser.

    简化数据处理。我相信在数据库中创建JSON数据将简化(并可能加速)父应用程序中的数据转换过程。 (您将注意到我们的示例测试用例中的整数数据已正确存储在生成的JSON对象中.PostgreSQL通过根据JSON规范自动将其内部数据类型转换为JSON来处理此问题。)这将有效地消除需求手动转换传递给父应用程序的数据:它们都可以委托给应用程序的本机JSON解析器。

Differences (and possible drawbacks):

差异(和可能的缺点):

  • It looks different. There's no denying that the results of this approach look different. The JSON object is not as pretty as the crosstab result set; however, the differences are purely cosmetic. The same information is produced--and in a format that is probably more friendly for consumption by parent applications.

    它看起来不同。不可否认,这种方法的结果看起来不同。 JSON对象不如交叉表结果集漂亮;然而,差异纯粹是装饰性的。生成相同的信息 - 并且格式可能更适合父应用程序使用。

  • Missing keys. Missing values in the crosstab approach are filled in with nulls, while the JSON objects are simply missing the applicable keys. You will have to decide for your self if this is an acceptable trade off for your use case. It seems to me that any attempt to address this problem in PostgreSQL will greatly complicate the process and likely involve some introspection in the form of additional queries.

    缺少钥匙。交叉表方法中缺少的值用空值填充,而JSON对象只是缺少适用的键。如果这是您的用例可接受的权衡,您将不得不自己决定。在我看来,在PostgreSQL中解决这个问题的任何尝试都会使这个过程变得非常复杂,并且可能会以其他查询的形式进行一些内省。

  • Key order is not preserved. I don't know if this can be addressed in PostgreSQL, but this issue is mostly cosmetic also, since any parent applications are either unlikely to rely on key order, or have the ability to determine proper key order by other means. The worst case will probably only require an addition query of the database.

    密钥订单不会保留。我不知道这是否可以在PostgreSQL中解决,但这个问题主要是装饰性的,因为任何父应用程序要么不太可能依赖于密钥顺序,要么能够通过其他方式确定正确的密钥顺序。最坏的情况可能只需要对数据库进行额外查询。

Conclusion

结论

I am very curious to hear the opinions of others (especially @ErwinBrandstetter's) on this approach, especially as it pertains to performance. When I discovered this approach on Andrew Bender's blog, it was like getting hit in the side of the head. What a beautiful way to take a fresh approach to a difficult problem in PostrgeSQL. It solved my use case perfectly, and I believe it will likewise serve many others as well.

我很好奇听到其他人(特别是@ ErwinBrandstetter)对这种方法的看法,特别是因为它与性能有关。当我在Andrew Bender的博客上发现这种方法时,就好像被击中头部一样。对PostrgeSQL中的难题采取新方法是多么美妙的方式。它完美地解决了我的用例,我相信它也会同样服务于其他许多用户。

#3


4  

In your case I guess an array is good. SQL Fiddle

在你的情况下,我猜一个阵列是好的。 SQL小提琴

select
    bar,
    feh || array_fill(null::int, array[c - array_length(feh, 1)]) feh
from
    (
        select bar, array_agg(feh) feh
        from foo
        group by bar
    ) s
    cross join (
        select count(*)::int c
        from foo
        group by bar
        order by c desc limit 1
    ) c(c)
;
 bar |      feh      
-----+---------------
 A   | {10,20,NULL}
 B   | {3,4,NULL}
 C   | {5,NULL,NULL}
 D   | {6,7,8}

#4


4  

This is to complete @Damian good answer. I have already suggested the JSON approach in other answers before the 9.6's handy json_object_agg function. It just takes more work with the previous tool set.

这是完成@Damian的好答案。在9.6的方便的json_object_agg函数之前,我已经在其他答案中提出了JSON方法。使用以前的工具集需要更多的工作。

Two of the cited possible drawbacks are really not. The random key order is trivially corrected if necessary. The missing keys, if relevant, takes an almost trivial amount of code to be addressed:

引用的两个可能的缺点实际上并非如此。如有必要,可以轻松纠正随机密钥顺序。丢失的密钥(如果相关)需要处理几乎微不足道的代码:

select
    row_name as bar,
    json_object_agg(attrib, val order by attrib) as data
from
    tbl
    right join
    (
        (select distinct row_name from tbl) a
        cross join
        (select distinct attrib from tbl) b
    ) c using (row_name, attrib)
group by row_name
order by row_name
;
 bar |                     data                     
-----+----------------------------------------------
 a   | { "val1" : 10, "val2" : 20, "val3" : null }
 b   | { "val1" : 3, "val2" : 4, "val3" : null }
 c   | { "val1" : 5, "val2" : null, "val3" : null }
 d   | { "val1" : 6, "val2" : 7, "val3" : 8 }

For a final query consumer which understands JSON there are no drawbacks. The only one is that it can not be consumed as a table source.

对于理解JSON的最终查询使用者而言,没有任何缺点。唯一的一个是它不能作为表源使用。

#5


1  

I'm sorry about returning in the past, but the solution "Dynamic Crosstab" returns erroneous result table. Thus, the valN values are erroneously "aligned to the left" and they don't correspond to the column names. When the input table has "holes" in the values, e.g. "C" has val1 and val3 but not val2. This produces an error: val3 value will be ranged in the column val2 (i.e. the next free column) in the final table.

我很遗憾回到过去,但解决方案“Dynamic Crosstab”会返回错误的结果表。因此,valN值错误地“向左对齐”,并且它们不对应于列名。当输入表中的值中有“空洞”时,例如“C”具有val1和val3但不具有val2。这会产生错误:val3值将在最终表中的列val2(即下一个空闲列)中进行调整。

CREATE TEMP TABLE tbl (row_name text, attrib text, val int); 
INSERT INTO tbl (row_name, attrib, val) VALUES ('C', 'val1', 5) ('C', 'val3', 7);

SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl 
ORDER BY 1,2') AS ct (row_name text, val1 int, val2 int, val3 int);

row_name|val1|val2|val3
 C      |   5|  7 |

In order to return correct cells with "holes" in the right column, the crosstab query requires a 2nd SELECT in the crosstab, something like this "crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2', 'select distinct row_name from tbl order by 1')"

为了在右列中返回带有“孔”的正确单元格,交叉表查询需要交叉表中的第二个SELECT,类似于“交叉表('SELECT row_name,attrib,val FROM tbl ORDER BY 1,2','select不同的row_name从tbl顺序乘1')“