提高以下查询性能的方法(更新表)

This question is related to a question I published a while ago, and can be found here: Update values of database with values that are already in DB .

这个问题与我不久前发布的一个问题有关，可以在这里找到:数据库的更新值和已经在DB中的值。

I've the following situation: A table that stores data from different sensors (I've a total of 8 sensors). Each row of the table has the following structure:

我有如下的情况:一个存储来自不同传感器的数据的表(我总共有8个传感器)。表的每一行有以下结构:

SensorID --- TimestampMS --- RawData --- Data

感觉--- TimestampMS --- RawData ---数据。

So, for example, for a temperature sensor called TEMPSensor1 I have the following:

举个例子，对于一个叫做TEMPSensor1的温度传感器我有如下内容:

TEMPSensor1 --- 1000 --- 200 --- 2
TEMPSensor1 --- 2000 --- 220 --- 2.2

TEMPSensor1 --- -- 1000 --- --- - 2 TEMPSensor1 --- 2000 --- 220 --- 2.2。

And so on, for each sensor (in total I've 8). I've some problems reading the data, and there are rows which data "is not correct". Precisely when the rawdata field is 65535, I should update that particular row. And what I would like to do is put the next value (in time) to that "corrupted data". So, if we have this:

对于每个传感器(总共有8个)，我读取数据有一些问题，有些行数据“不正确”。准确地说，当rawdata字段是65535时，我应该更新那个特定的行。我要做的是将下一个值(及时)放到“损坏的数据”中。所以，如果我们有这个:

TEMPSensor1 --- 1000 --- 200 --- 2
TEMPSensor1 --- 2000 --- 220 --- 2.2
TEMPSensor1 --- 3000 --- 65535 --- 655.35
TEMPSensor1 --- 4000 --- 240 --- 2.4

TEMPSensor1 --- --- --- --- -- 2000 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --

After doing the Update, the content of the table should be changed to:

更新后，表的内容应改为:

TEMPSensor1 --- 1000 --- 200 --- 2
TEMPSensor1 --- 2000 --- 220 --- 2.2
TEMPSensor1 --- 3000 --- 240 --- 2.4
TEMPSensor1 --- 4000 --- 240 --- 2.4

TEMPSensor1 --- --- --- --- --- --- 2000 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --

I've ended up doing the following:

我最后做了以下工作:

UPDATE externalsensor es1
INNER JOIN externalsensor es2 ON es1.sensorid = es2.sensorid AND (es2.timestampms - es1.timestampms) > 60000 AND (es2.timestampms - es1.timestampms) < 120000  AND es1.rawdata <> 65535
SET es1.rawdata = es2.rawdata, es1.data = es2.data
WHERE es1.rawdata = 65535

Because I know that between two reads from a sensor there are between 60000 and 120000 ms. However, if I have two following "corrupted" readings, that won't work. Can anyone suggest a way to do this more efficiently, with no use of subquery selects, but JOINS? My idea would be to have a JOIN that gives you all the possible values for that sensor after its timestampms, and just get the first one, but I don't know how I can limit that JOIN result.

因为我知道在两个读数之间有60000到120000毫秒。但是，如果我有两个“损坏”的读数，那就不行。有没有人建议一种更有效的方法，不使用子查询选择，而是加入?我的想法是，有一个连接，在它的时间戳后，给你所有可能的值，然后得到第一个，但我不知道如何限制连接结果。

Appreciate.

升值。

3 个解决方案

#1

A completely different approach, using a procedure that runs through the whole table (in descending time order for every sensor):

一种完全不同的方法，使用贯穿整个表的过程(每个传感器的降序时间顺序):

DELIMITER $$
CREATE PROCEDURE updateMyTable()
BEGIN
  SET @dummy := -9999 ;
  SET @dummy2 := -9999 ;
  SET @sensor := -999 ;

  UPDATE myTable m
    JOIN 
    ( SELECT n.SensorID
           , n.TimestampMS
           , @d := (n.RawData = 65535) AND (@sensor = n.SensorID) AS problem
           , @dummy  := IF(@d, @dummy, n.RawData) as goodRawData
           , @dummy2 := IF(@d, @dummy2, n.Data) as goodData
           , @sensor := n.SensorID AS previous
      FROM myTable n
      ORDER BY n.SensorID
             , n.TimeStampMS DESC
    ) AS upd
    ON m.SensorID = upd.SensorID
      AND m.TimeStampMS = upd.TimeStampMS
  SET m.RawData = upd.goodRawData
    , m.Data = upd.goodData
  WHERE upd.problem
  ;
END$$
DELIMITER ;

#2

Here's a solution without correlated subqueries, but with a triangular join (not sure which is worse):

这里有一个没有相关子查询的解决方案，但有一个三角形连接(不确定哪个更糟):

UPDATE externalsensor bad

  INNER JOIN (
    SELECT
      es1.SensorID,
      es1.TimestampMS,
      MIN(es2.TimestampMS) AS NextGoodTimestamp
    FROM externalsensor es1
      INNER JOIN externalsensor es2
        ON es1.SensorID = es2.SensorID AND
           es1.TimestampMS < es2.TimestampMS
    WHERE es1.RawData = 65535
      AND es2.RawData <> 65535
    GROUP BY
      es1.SensorID,
      es1.TimestampMS
  ) link ON bad.SensorID = link.SensorID AND
            bad.TimestampMS = link.TimestampMS

  INNER JOIN externalsensor good
    ON link.SensorID = good.SensorID AND
       link.NextGoodTimestamp = good.TimestampMS

SET
  bad.RawData = good.RawData,
  bad.Data = good.Data

It is assumed that the timestamps are unique within a single sensor group.

假设时间戳在单个传感器组中是唯一的。

#3

Since you don't want to use Dems solution from the previous question, here's a "solution" with JOIN's:

既然你不想从之前的问题中使用*党的解决方案，这里有一个“解决方案”:

UPDATE myTable m
  JOIN myTable n
    ON m.SensorID = n.SensorID
      AND n.RawData <> 65535
      AND m.TimestampMS < n.TimestampMS
  JOIN myTable q
    ON n.SensorID = q.SensorID
      AND q.RawData <> 65535
      AND n.TimestampMS <= q.TimestampMS
SET
  m.RawData = n.RawData,
  m.Data = n.Data
WHERE
   m.RawData = 65535
;

EDIT

编辑

My query above is wrong, dead wrong. It appears to be working in my test db but the logic is flawed. I'll explain below.

我上面的查询是错误的，大错特错。它似乎在我的测试数据库中工作，但逻辑是有缺陷的。下面我将解释。

Why the above query works fine but is dead wrong:

为什么上面的查询可以正常工作，但是大错特错:

First, why it's wrong.

首先,为什么它是错误的。

Because it will not return one row for every (sensorID, bad timestamp) combination but many rows. If m (m.TimestampMS) is the bad timestamp we want to find, it will return all combinations of that bad timetsamp and later good timestamps n and q with n.TimestampMS <= q.TimestampMS. It would be a correct query if it found the MINIMUM of these n timestamps.

因为它不会为每一个(敏感的、糟糕的时间戳)组合返回一行，而是返回许多行。如果m (m. timestampms)是我们想要找到的糟糕的时间戳，它将返回所有坏的timetsamp的组合，以及稍后的good timestamp n和q和n。TimestampMS < = q.TimestampMS。如果它找到了这些n个时间戳的最小值，那将是一个正确的查询。

Now, how come it actually works all right in my test db?

那么，为什么它在我的测试数据库中运行正常呢?

I think it's because MySQL, when it comes to use the SET ... and has a lot of options (rows) it just uses first option. But lucky me, I added the test rows in increasing timestamp order so they were saved in that order in the db, and (again) lucky me, this is how the query plan is scheduled (I presume).

我想这是因为MySQL在使用集合时…并且有很多选项(行)它只使用第一种选项。但是幸运的是，我在增加时间戳顺序的时候添加了测试行，这样它们就可以按照这个顺序保存在db中，并且(再次)幸运的是，这就是查询计划的计划(我假定)。

Even this query works in my test db:

即使这个查询在我的测试数据库中也能工作:

UPDATE myTable m
  JOIN myTable n
    ON m.SensorID = n.SensorID
      AND n.RawData <> 65535
      AND m.TimestampMS < n.TimestampMS
SET
  m.RawData = n.RawData,
  m.Data = n.Data
WHERE
   m.RawData = 65535
;

while being flawed for the same reasons.

因为同样的原因而存在缺陷。

#1