Google Big Query:窗口函数行明确列的累积和

时间:2021-04-03 23:00:00

I am looking to calculate cumulative sum across columns in Google Big Query.

我希望计算Google Big Query中各列的累积总和。

Assume there are five columns (NAME,A,B,C,D) with two rows of integers, for example:

假设有五列(NAME,A,B,C,D)具有两行整数,例如:

 NAME | A | B | C | D
----------------------
 Bob  | 1 | 2 | 3 | 4
 Carl | 5 | 6 | 7 | 8

I am looking for a windowing function or UDF to calculate the cumulative sum across rows to generate this output:

我正在寻找一个窗口函数或UDF来计算跨行的累积和来生成此输出:

 NAME | A | B  | C  | D
-------------------------
 Bob  | 1 | 3  | 6  | 10
 Carl | 5 | 11 | 18 | 27

Any thoughts or suggestions greatly appreciated!

任何想法或建议非常感谢!

2 个解决方案

#1


1  

I think, there are number of reasonable workarounds for your requirements mostly in the area of designing better your table. All really depends on how you input your data and most importantly how than you consume it

我认为,对于您的要求,有许多合理的解决方法,主要是在设计更好的表格方面。所有这些都取决于您输入数据的方式,最重要的是取决于您如何使用数据

Still, if to stay with presented requirements - Below is not exactly what you expect in your question as an output, but might be usefull as an example:

尽管如此,如果要保持所提出的要求 - 以下并不完全是您在问题中作为输出所期望的,但作为示例可能有用:

SELECT name, GROUP_CONCAT(STRING(cum)) AS all FROM (
  SELECT name, 
    SUM(INTEGER(num)) 
    OVER(PARTITION BY name 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum
  FROM (
    SELECT name, SPLIT(all) AS num FROM (
      SELECT name, 
         CONCAT(STRING(a),',',STRING(b),',',STRING(c),',',STRING(d)) AS all 
      FROM yourtable
    )
  )
)
GROUP BY name

Output is:

name    all  
Bob     1,3,6,10     
Carl    5,11,18,26   

Depends on how you than consume this data - it still can work for you Note, not you avoiding now writing something like col1 + col2 + .. + col89 + col90 - but still need to explicitelly mention each column just ones.

取决于你如何消费这些数据 - 它仍然可以为你工作注意,而不是你现在写一些像col1 + col2 + .. + col89 + col90的东西 - 但仍需要明确提到每一列只是一些。

in case if you have "luxury" of implementing your requirements outside of GBQ UI, but rather in some Client- you can use BigQuery API to programatically aquire table schema and build on fly your logic/query and than execute it Take a look at below APIs to start with:
To get table schema - https://cloud.google.com/bigquery/docs/reference/v2/tables/get
To issue query job - https://cloud.google.com/bigquery/docs/reference/v2/jobs/insert

如果你有“奢侈”在GBQ UI之外实现你的需求,而是在某些客户端 - 你可以使用BigQuery API以编程方式获取表模式并构建你的逻辑/查询并执行它看看下面开始的API:获取表架构 - https://cloud.google.com/bigquery/docs/reference/v2/tables/get要发出查询作业 - https://cloud.google.com/bigquery/docs/参考/ V2 /职位/插入

#2


0  

There's no need for a UDF:

不需要UDF:

SELECT name, a, a+b, a+b+c, a+b+c+d
FROM tab

#1


1  

I think, there are number of reasonable workarounds for your requirements mostly in the area of designing better your table. All really depends on how you input your data and most importantly how than you consume it

我认为,对于您的要求,有许多合理的解决方法,主要是在设计更好的表格方面。所有这些都取决于您输入数据的方式,最重要的是取决于您如何使用数据

Still, if to stay with presented requirements - Below is not exactly what you expect in your question as an output, but might be usefull as an example:

尽管如此,如果要保持所提出的要求 - 以下并不完全是您在问题中作为输出所期望的,但作为示例可能有用:

SELECT name, GROUP_CONCAT(STRING(cum)) AS all FROM (
  SELECT name, 
    SUM(INTEGER(num)) 
    OVER(PARTITION BY name 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum
  FROM (
    SELECT name, SPLIT(all) AS num FROM (
      SELECT name, 
         CONCAT(STRING(a),',',STRING(b),',',STRING(c),',',STRING(d)) AS all 
      FROM yourtable
    )
  )
)
GROUP BY name

Output is:

name    all  
Bob     1,3,6,10     
Carl    5,11,18,26   

Depends on how you than consume this data - it still can work for you Note, not you avoiding now writing something like col1 + col2 + .. + col89 + col90 - but still need to explicitelly mention each column just ones.

取决于你如何消费这些数据 - 它仍然可以为你工作注意,而不是你现在写一些像col1 + col2 + .. + col89 + col90的东西 - 但仍需要明确提到每一列只是一些。

in case if you have "luxury" of implementing your requirements outside of GBQ UI, but rather in some Client- you can use BigQuery API to programatically aquire table schema and build on fly your logic/query and than execute it Take a look at below APIs to start with:
To get table schema - https://cloud.google.com/bigquery/docs/reference/v2/tables/get
To issue query job - https://cloud.google.com/bigquery/docs/reference/v2/jobs/insert

如果你有“奢侈”在GBQ UI之外实现你的需求,而是在某些客户端 - 你可以使用BigQuery API以编程方式获取表模式并构建你的逻辑/查询并执行它看看下面开始的API:获取表架构 - https://cloud.google.com/bigquery/docs/reference/v2/tables/get要发出查询作业 - https://cloud.google.com/bigquery/docs/参考/ V2 /职位/插入

#2


0  

There's no need for a UDF:

不需要UDF:

SELECT name, a, a+b, a+b+c, a+b+c+d
FROM tab