如何搜索包含特定单词的行然后返回每个单词的计数?

时间:2022-11-16 19:19:05

I have 150,000 rows of data which I'm attempting to query in Google BigQuery.

我有150,000行数据,我试图在Google BigQuery中查询。

Column Text contains various lengths of text, from which I want to query for particular keywords.

列文本包含各种长度的文本,我想从中查询特定关键字。

I've gotten as far as the query below which returns all rows containing a particular keyword (e.g. facebook):

我已经得到了下面的查询,它返回包含特定关键字的所有行(例如facebook):

SELECT Text From Data.Set_1 
WHERE Text CONTAINS 'facebook'

Questions:

问题:

1) How do I improve the query so that it returns a total count of all occurrences of the keyword 'facebook' across 'Text' in a new column?

1)如何改进查询,以便在新列中返回“Text”中所有关键字“facebook”出现的总数?

2) How do I upscale this to multiple keywords (facebook, cnn, bbc, twitter) and return a total count of each keyword present in the data (eg facebook 42, cnn 54, bbc 88, twitter 49)?

2)如何将其升级为多个关键词(facebook,cnn,bbc,twitter)并返回数据中存在的每个关键词的总数(例如facebook 42,cnn 54,bbc 88,twitter 49)?

2 个解决方案

#1


0  

for BigQuery Legacy SQL

for BigQuery Legacy SQL

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM YourTable 
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

Example to play with

玩的例子

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM (
  SELECT Text FROM
    (SELECT 'facebookfacebookcnnbbccnn' AS Text),
    (SELECT 'facebook' AS Text), 
    (SELECT 'cnn' AS Text)
) AS words 
CROSS JOIN (
  SELECT keyword FROM 
    (SELECT 'facebook' AS keyword),
    (SELECT 'cnn' AS keyword), 
    (SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

For BigQuery Standard SQL (see Enabling Standard SQL)

对于BigQuery Standard SQL(请参阅启用标准SQL)

SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM YourTable 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword

Example to play with

玩的例子

WITH keywords AS (
  SELECT 'facebook' AS keyword UNION ALL
  SELECT 'cnn' AS keyword UNION ALL
  SELECT 'bbc' AS keyword 
),
words AS (
  SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
  SELECT 'facebook' AS Text UNION ALL
  SELECT 'cnn' AS Text 
)
SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM words 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword

#2


0  

You can use a derived table to include all the words you are looking for, and then use aggregation to count the matches:

您可以使用派生表来包含您要查找的所有单词,然后使用聚合来计算匹配项:

SELECT w.keyword, COUNT(s.Text)
From (SELECT 'facebook' as keyword UNION ALL
      SELECT 'cnn'
     ) w LEFT JOIN
     Data.Set_1 s
     ON s.Text CONTAINS w.keyword
GROUP BY w.keyword;

Do note: This is not particularly efficient. The performance should be roughly linear in the number of keywords.

请注意:这不是特别有效。性能应该与关键字数量大致呈线性关系。

#1


0  

for BigQuery Legacy SQL

for BigQuery Legacy SQL

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM YourTable 
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

Example to play with

玩的例子

SELECT 
  keyword, 
  COUNT(1) AS rows, 
  SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences 
FROM (
  SELECT Text FROM
    (SELECT 'facebookfacebookcnnbbccnn' AS Text),
    (SELECT 'facebook' AS Text), 
    (SELECT 'cnn' AS Text)
) AS words 
CROSS JOIN (
  SELECT keyword FROM 
    (SELECT 'facebook' AS keyword),
    (SELECT 'cnn' AS keyword), 
    (SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword

For BigQuery Standard SQL (see Enabling Standard SQL)

对于BigQuery Standard SQL(请参阅启用标准SQL)

SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM YourTable 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword

Example to play with

玩的例子

WITH keywords AS (
  SELECT 'facebook' AS keyword UNION ALL
  SELECT 'cnn' AS keyword UNION ALL
  SELECT 'bbc' AS keyword 
),
words AS (
  SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
  SELECT 'facebook' AS Text UNION ALL
  SELECT 'cnn' AS Text 
)
SELECT 
  keyword, 
  COUNT(1) AS `rows`, 
  SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences  
FROM words 
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword

#2


0  

You can use a derived table to include all the words you are looking for, and then use aggregation to count the matches:

您可以使用派生表来包含您要查找的所有单词,然后使用聚合来计算匹配项:

SELECT w.keyword, COUNT(s.Text)
From (SELECT 'facebook' as keyword UNION ALL
      SELECT 'cnn'
     ) w LEFT JOIN
     Data.Set_1 s
     ON s.Text CONTAINS w.keyword
GROUP BY w.keyword;

Do note: This is not particularly efficient. The performance should be roughly linear in the number of keywords.

请注意:这不是特别有效。性能应该与关键字数量大致呈线性关系。