在GA的BigQuery集成上选择不同的customDimensions

时间:2021-07-30 18:04:41

Sorry for the specific question but I feel like I've hit a dead end because my knowledge of SQL doesn't go that far.

对不起具体的问题,但我觉得我已经走到了尽头,因为我对SQL的了解并没有那么远。

The data that comes out from the BigQuery implementation of GoogleAnalytics raw data looks like this:

从GoogleAnalytics原始数据的BigQuery实现中获得的数据如下所示:

|-visitId
|- date
|- (....)
+- hits
   |- time
   +- customDimensions
      |- index
      |- value
   +- customMetrics
      |- index
      |- value

I know there are hits that always send some data to GA. Specifically I want customDimensions.index= 43, customDimensions.index= 24 and customMetrics.index=14. To specify, Dimension 43 is the object being seen or sold, dimension 24 tells me if they are being seen and metric 14, the value has 1 when it's just been sold. My final result should look like this:

我知道总有一些数据会向GA发送一些数据。具体来说,我想要customDimensions.index = 43,customDimensions.index = 24和customMetrics.index = 14。要指定,Dimension 43是正在查看或出售的对象,尺寸24告诉我它们是否被看到以及公制14,当它刚刚被出售时该值为1。我的最终结果应如下所示:

customDimension.value( when index=43)    count(when customDimension.index=24 and customDimension.value=='ficha')      count(when customMetrics.index=14 and customMetrics.value ==1))

Grouped by customDimension.value (when index=43) I know that everytime that a hit is sent with customMetrics.index=14, the same hit has customDimensions.index=43, the same way, customDimensions.index=24 always has a customDimensions.index=43. I actually managed to create an SQL that does what I want but, at what cost? It's big, it's slow, it's ugly. What I'm currently doing is:

由customDimension.value分组(当index = 43时)我知道每次使用customMetrics.index = 14发送一个匹配时,同样的命中有customDimensions.index = 43,同样的方式,customDimensions.index = 24总是有一个customDimensions的.index = 43。我实际上设法创建一个SQL,它做我想要的,但代价是什么?它很大,很慢,很难看。我目前正在做的是:

  • Create three tables, all having visitId, hit.time and the value when index=14,24,43
  • 创建三个表,所有表都有visitId,hit.time和index = 14,24,43时的值

  • Left join 43 with 24 ON 43.visitId==24.visitId AND 43.hits.time==24.hits.time as result
  • 左连接43与24 ON 43.visitId == 24.visitId AND 43.hits.time == 24.hits.time作为结果

  • Left join result with 14 ON 14.visitId==result.visitId AND 14.hits.time==result.hits.time
  • 左连接结果为14 ON 14.visitId == result.visitId AND 14.hits.time == result.hits.time

I'm not interested in visitId or hits.time, it's just a way to relate the same hits (and know which product they bought when the customMetrics.index=14 and value=1.

我对visitId或hits.time不感兴趣,它只是一种关联相同命中的方法(并且知道当customMetrics.index = 14和value = 1时他们购买了哪些产品。

This is my code:

这是我的代码:

SELECT Tviviendasvisitas.viviendaId as ViviendaID ,sum(Tviviendasvisitas.NumeroVisitas) as NumeroVisitas,sum(Ttransacciones.Transactions) as Transactions FROM (
SELECT Tviviendas.visitId as visitId, Tviviendas.hits.time as visitTime, Tviviendas.ViviendaID as viviendaId,Tvisitas.visitas as NumeroVisitas FROM (
SELECT  visitId,hits.time,hits.customDimensions.value as ViviendaID FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))))
WHERE hits.customDimensions.index = 43 
GROUP EACH BY visitId,hits.time, ViviendaID)as Tviviendas

LEFT JOIN EACH(
SELECT  visitId,hits.time,count(*) as visitas FROM ((TABLE_DATE_RANGE([-------.ga_sessions_],  TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))))
WHERE hits.customDimensions.index = 24 AND hits.customDimensions.value=='ficha'
GROUP EACH BY visitId,hits.time) as Tvisitas
ON Tvisitas.visitId==Tviviendas.visitId AND Tvisitas.time==Tviviendas.time) as Tviviendasvisitas

LEFT JOIN EACH (
SELECT  visitId ,hits.time as transactionTime, sum(hits.customMetrics.value) as Transactions FROM(TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30')))
WHERE hits.customMetrics.index = 14 AND hits.customMetrics.value=1
GROUP BY visitId, transactionTime) as Ttransacciones
ON Tviviendasvisitas.visitId==Ttransacciones.visitId AND Tviviendasvisitas.visitTime==Ttransacciones.transactionTime
GROUP BY ViviendaID

Running this query takes way too much time for me to create a propper dashboard with the results.

运行此查询需要花费太多时间来创建包含结果的propper仪表板。

So help me God if that's my final result. I feel like there should be a WAY more elegant solution to this problem but I can't seem to find it on my own.

如果这是我的最终结果,请帮助我。我觉得应该有一个更优雅的解决方案来解决这个问题,但我似乎无法自己找到它。

Help?

2 个解决方案

#1


3  

You should be able to structure this query without the joins by using BigQuery's scoped aggregation (the WITHIN clause). Here is a small example, which may not be exactly the logic you want, but should illustrate some of the possibilities:

通过使用BigQuery的作用域聚合(WITHIN子句),您应该能够在没有连接的情况下构造此查询。这是一个小例子,可能不是你想要的逻辑,但应该说明一些可能性:

SELECT  visitId, hits.time,
        SOME(hits.customDimensions.index = 43) WITHIN RECORD AS has43,
        SUM(IF(hits.customDimensions.index = 24 AND hits.customDimensions.value = 'ficha', 1, 0)) WITHIN RECORD AS numFichas,
        SUM(IF(hits.customMetrics.index = 14, hits.customMetrics.value, 0)) WITHIN RECORD AS totalValues
FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))))
HAVING has43

The example shows three WITHIN RECORD aggregations, meaning they will be computed over the repeated fields of a single record. SOME() takes a boolean expression and returns true if any field within the record satisfies that expression. So has43 will be true for visits that have one or more hits with customDimensions.index = 43. The HAVING clause filters out records where that is false.

该示例显示了三个WITHIN RECORD聚合,这意味着它们将在单个记录的重复字段上计算。 SOME()接受一个布尔表达式,如果记录中的任何字段满足该表达式,则返回true。因此,对于具有一个或多个使用customDimensions.index = 43的命中的访问,has43将成立.HAVING子句过滤掉那些假的记录。

The SUM(IF(...)) expressions compute the total number of customDimensions with index = 24 and value = 'ficha' and the total values associated with the customMetrics with index = 14.

SUM(IF(...))表达式计算index = 24和value ='ficha'的customDimensions的总数以及与index = 14的customMetrics相关联的总值。

#2


-1  

If you just want to get the value from a hitLevel customDimension and add it to its own column here is a neat trick:

如果您只想从hitLevel customDimension获取值并将其添加到自己的列,这是一个巧妙的技巧:

SELECT fullVisitorId, visitId, hits.hitNumber,
MAX(IF(hits.customDimensions.index=43, 
       hits.customDimensions.value, 
       NULL)) WITHIN hits AS product,
FROM [tableID.ga_sessions_20150305]
LIMIT 100

#1


3  

You should be able to structure this query without the joins by using BigQuery's scoped aggregation (the WITHIN clause). Here is a small example, which may not be exactly the logic you want, but should illustrate some of the possibilities:

通过使用BigQuery的作用域聚合(WITHIN子句),您应该能够在没有连接的情况下构造此查询。这是一个小例子,可能不是你想要的逻辑,但应该说明一些可能性:

SELECT  visitId, hits.time,
        SOME(hits.customDimensions.index = 43) WITHIN RECORD AS has43,
        SUM(IF(hits.customDimensions.index = 24 AND hits.customDimensions.value = 'ficha', 1, 0)) WITHIN RECORD AS numFichas,
        SUM(IF(hits.customMetrics.index = 14, hits.customMetrics.value, 0)) WITHIN RECORD AS totalValues
FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))))
HAVING has43

The example shows three WITHIN RECORD aggregations, meaning they will be computed over the repeated fields of a single record. SOME() takes a boolean expression and returns true if any field within the record satisfies that expression. So has43 will be true for visits that have one or more hits with customDimensions.index = 43. The HAVING clause filters out records where that is false.

该示例显示了三个WITHIN RECORD聚合,这意味着它们将在单个记录的重复字段上计算。 SOME()接受一个布尔表达式,如果记录中的任何字段满足该表达式,则返回true。因此,对于具有一个或多个使用customDimensions.index = 43的命中的访问,has43将成立.HAVING子句过滤掉那些假的记录。

The SUM(IF(...)) expressions compute the total number of customDimensions with index = 24 and value = 'ficha' and the total values associated with the customMetrics with index = 14.

SUM(IF(...))表达式计算index = 24和value ='ficha'的customDimensions的总数以及与index = 14的customMetrics相关联的总值。

#2


-1  

If you just want to get the value from a hitLevel customDimension and add it to its own column here is a neat trick:

如果您只想从hitLevel customDimension获取值并将其添加到自己的列,这是一个巧妙的技巧:

SELECT fullVisitorId, visitId, hits.hitNumber,
MAX(IF(hits.customDimensions.index=43, 
       hits.customDimensions.value, 
       NULL)) WITHIN hits AS product,
FROM [tableID.ga_sessions_20150305]
LIMIT 100