不在谷歌BigQuery标准sql工作

时间:2022-06-01 19:15:57

I am using Google BigQuery,i am trying to find the 'userid's from 'table2' excluding the ones that are stored in 'table1' 2 or more times. this is the code :

我正在使用Google BigQuery,我试图从'table2'中找到'userid',不包括那些存储在'table1'中的2次或更多次。这是代码:

#standardSQL
WITH t100 AS (
SELECT count_table.userid 
From(
SELECT userid,COUNT(`project.dataset.table1`.userid) as notification_count 
FROM `project.dataset.table1`
 GROUP BY userid) as count_table 
where notification_count >= 2 
)

SELECT userid FROM `project.dataset.table2` WHERE userid NOT IN (SELECT userid  FROM t100)

the problem is that this is returning the 'userid's from 'table1' that are stored 2 or more times, i have tried adding WHERE userid IS NOT NULL to the SELECT userid FROM t100, yet it made no difference. and just so that everything is clearer, this : SELECT userid FROM t100, is not empty and the results returned for some reason still show in the result of the first code above.

问题是,这是从'table1'返回存储2次或更多次的'userid',我已经尝试将WHERE userid IS NOT NULL添加到SELECT用户ID FROM T100,但它没有任何区别。为了让一切都更清晰,这个:SELECT userid FROM t100,不是空的,由于某种原因返回的结果仍然显示在上面第一个代码的结果中。

3 个解决方案

#1


2  

i have tried adding WHERE userid IS NOT NULL to the SELECT userid FROM t100, yet it made no difference

我已经尝试将WHERE userid IS NOT NULL添加到SELECT用户ID FROM T100,但它没有任何区别

This of course had no affect because when you do COUNT(userid) as notification_count it always returns 0 for userid NULL thus was filtered out by HAVING notification_count >= 2
If you would use COUNT(1) instead - that's where you would potentially get null userids in output of t100. So userid is NULL is definitelly not an issue here

这当然没有任何影响,因为当您将COUNT(userid)作为notification_count时,它总是为userid NULL返回0因此被HAVING过滤掉了通知_count> = 2如果您将使用COUNT(1) - 那么您可能会获得null输出t100的用户ID。所以userid是NULL定义不是问题

As others pointed - your query should work - so if you continue getting the problem - you need to dig more in this issue and provide us with more details on it

正如其他人指出的那样 - 你的查询应该有效 - 所以如果你继续遇到问题 - 你需要在这个问题上挖掘更多,并向我们提供更多细节

Meantime, try below as yet another version of your (otherwise looking good) query

同时,请尝试以下另一个版本的(看起来不错)查询

#standardSQL
WITH t100 AS (
  SELECT userid
  FROM `project.dataset.table1`
  GROUP BY userid
  HAVING COUNT(userid) >= 2 
)
SELECT userid
FROM `project.dataset.table2` AS t2
LEFT join t100 ON t100.userid = t2.userid
WHERE t100.userid IS NULL

#2


1  

Not sure why this isn't working, but out of general principle, I never use (not) in in combination with a select statement. Rather, I would left outer join the subquery and filter on null values therein:

不知道为什么这不起作用,但出于一般原则,我从不使用(不)与select语句结合使用。相反,我会留下外连接子查询并过滤其中的空值:

#standardSQL

with t100 as (
select
  count_table.userid

from(
select
  userid
  ,count(`project.dataset.table1`.userid) as notification_count 

from `project.dataset.table1`

group by
  userid
) as count_table 

where notification_count >= 2 
)

select
  t2.userid as userid

from `project.dataset.table2` t2
left outer join t100
  on t100.userid = t2.userid

where t100.userid is null

#3


0  

It's due to null handling. There was a similar post on our issue tracker about NOT IN versus NOT EXISTS. The documentation for IN states:

这是由于空处理。在我们的问题跟踪器上有类似的帖子关于NOT IN与NOT EXISTS。 IN的文档说明:

IN with a NULL in the IN-list can only return TRUE or NULL, never FALSE

IN列表中的NULL只能返回TRUE或NULL,永远不会为FALSE

To achieve the semantics that you want, you should use an anti semijoin (NOT EXISTS). For example,

要实现所需的语义,您应该使用反半连接(NOT EXISTS)。例如,

#standardSQL
WITH t100 AS (
  SELECT
    userid,
    COUNT(userid) as notification_count 
  FROM `project.dataset.table1`
  GROUP BY userid
  HAVING notification_count >= 2 
)
SELECT userid
FROM `project.dataset.table2` AS t2
WHERE NOT EXISTS (SELECT 1 FROM t100 WHERE userid = t2.userid);

#1


2  

i have tried adding WHERE userid IS NOT NULL to the SELECT userid FROM t100, yet it made no difference

我已经尝试将WHERE userid IS NOT NULL添加到SELECT用户ID FROM T100,但它没有任何区别

This of course had no affect because when you do COUNT(userid) as notification_count it always returns 0 for userid NULL thus was filtered out by HAVING notification_count >= 2
If you would use COUNT(1) instead - that's where you would potentially get null userids in output of t100. So userid is NULL is definitelly not an issue here

这当然没有任何影响,因为当您将COUNT(userid)作为notification_count时,它总是为userid NULL返回0因此被HAVING过滤掉了通知_count> = 2如果您将使用COUNT(1) - 那么您可能会获得null输出t100的用户ID。所以userid是NULL定义不是问题

As others pointed - your query should work - so if you continue getting the problem - you need to dig more in this issue and provide us with more details on it

正如其他人指出的那样 - 你的查询应该有效 - 所以如果你继续遇到问题 - 你需要在这个问题上挖掘更多,并向我们提供更多细节

Meantime, try below as yet another version of your (otherwise looking good) query

同时,请尝试以下另一个版本的(看起来不错)查询

#standardSQL
WITH t100 AS (
  SELECT userid
  FROM `project.dataset.table1`
  GROUP BY userid
  HAVING COUNT(userid) >= 2 
)
SELECT userid
FROM `project.dataset.table2` AS t2
LEFT join t100 ON t100.userid = t2.userid
WHERE t100.userid IS NULL

#2


1  

Not sure why this isn't working, but out of general principle, I never use (not) in in combination with a select statement. Rather, I would left outer join the subquery and filter on null values therein:

不知道为什么这不起作用,但出于一般原则,我从不使用(不)与select语句结合使用。相反,我会留下外连接子查询并过滤其中的空值:

#standardSQL

with t100 as (
select
  count_table.userid

from(
select
  userid
  ,count(`project.dataset.table1`.userid) as notification_count 

from `project.dataset.table1`

group by
  userid
) as count_table 

where notification_count >= 2 
)

select
  t2.userid as userid

from `project.dataset.table2` t2
left outer join t100
  on t100.userid = t2.userid

where t100.userid is null

#3


0  

It's due to null handling. There was a similar post on our issue tracker about NOT IN versus NOT EXISTS. The documentation for IN states:

这是由于空处理。在我们的问题跟踪器上有类似的帖子关于NOT IN与NOT EXISTS。 IN的文档说明:

IN with a NULL in the IN-list can only return TRUE or NULL, never FALSE

IN列表中的NULL只能返回TRUE或NULL,永远不会为FALSE

To achieve the semantics that you want, you should use an anti semijoin (NOT EXISTS). For example,

要实现所需的语义,您应该使用反半连接(NOT EXISTS)。例如,

#standardSQL
WITH t100 AS (
  SELECT
    userid,
    COUNT(userid) as notification_count 
  FROM `project.dataset.table1`
  GROUP BY userid
  HAVING notification_count >= 2 
)
SELECT userid
FROM `project.dataset.table2` AS t2
WHERE NOT EXISTS (SELECT 1 FROM t100 WHERE userid = t2.userid);