I have a table with N columns. Let's call them c1, c2, c3, c4, ... cN. Among multiple rows, I want to get a single row with COUNT DISTINCT(cX) for each X in [1, N].

我有一个N列的表。我们称它们为c1，c2，c3，c4，... cN。在多行中，我想为[1，N]中的每个X获得一行COUNT DISTINCT（cX）。

c1 | c2 | ... | cn
0  | 4  | ... | 1

Is there a way I can do this (in a stored procedure) without writing every column name into the query manually?

有没有办法我可以这样做（在存储过程中），而无需手动将每个列名写入查询？

Why?

We've had a problem where bugs in application servers mean we rewrite good column values with garbage inserted later. To solve this, I'm storing the information log-structure, where each row represents a logical UPDATE query. Then, when given a signal that the record is complete, I can determine if any values were (erroneously) overwritten.

我们遇到了一个问题，即应用程序服务器中的错误意味着我们会在以后插入垃圾时重写好的列值。为了解决这个问题，我存储了信息日志结构，其中每一行代表一个逻辑UPDATE查询。然后，当给出记录完成的信号时，我可以确定是否（错误地）覆盖了任何值。

An example of a single correct record in multiple rows: there is at most one value for each column.

多行中单个正确记录的示例：每列最多有一个值。

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 12:00am         | NULL       | 2:00pm   |

Reconciled row:
| 1  | 12:00am         | 1:00pm     | 2:00pm   |

An example of an irreconcilable record that I want to detect:

我要检测的不可调和记录的示例：

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 9:00am          | 1:00pm     | 2:00pm   |   -- New initialize time => irreconcilable!

1 个解决方案

#1

You need dynamic SQL for that, which means you have to create a function or run a DO command. Since you cannot return values directly from the latter, a plpgsql function it is:

您需要动态SQL，这意味着您必须创建一个函数或运行DO命令。因为你不能直接从后者返回值，所以plpgsql函数是：

CREATE OR REPLACE function f_count_all(_tbl text
                           , OUT columns text[], OUT counts bigint[])
  RETURNS record LANGUAGE plpgsql AS
$func$
BEGIN

EXECUTE (
    SELECT 'SELECT
     ARRAY[' || string_agg('''' || quote_ident(attname) || '''', ', ') || '], 
     ARRAY[' || string_agg('count(' || quote_ident(attname) || ')', ', ') || ']
    FROM ' || _tbl
    FROM   pg_attribute
    WHERE  attrelid = _tbl::regclass
    AND    attnum  >= 1           -- exclude tableoid & friends (neg. attnum)
    AND    attisdropped is FALSE  -- exclude deleted columns
    GROUP  BY attrelid
    )
INTO columns, counts;

END
$func$;

Call:

呼叫：

SELECT * FROM f_count_all('myschema.mytable');

Returns:

columns       | counts
--------------+--------
{c1, c2, c3,} | {17 1,0}

More explanation and links about dynamic SQL and EXECUTE in this related question - or a couple more here on SO, try this serach.

关于动态SQL和EXECUTE的更多解释和链接在这个相关的问题中 - 或者在SO上的更多内容，请尝试这个搜索。

Very similar to this question:
postgresql - count (no null values) of each column in a table

与此问题非常相似：postgresql - 表中每列的count（无空值）

You could even try and return a polymorphic record type to get single columns dynamically, but that's rather complex and advanced. Probably too much effort for your case. More in this related answer.

您甚至可以尝试返回多态记录类型以动态获取单个列，但这相当复杂和高级。对你的案子来说可能太费劲了。更多相关答案。

#1