批量MYSQL插入用于DB结构迁移中的性能

时间:2022-09-17 21:47:30

I need to restructure my MYSQL InnoDB database.

我需要重构我的MYSQL InnoDB数据库。

At the moment I have a customer table holding 3 product names.

目前我有一个包含3个产品名称的客户表。

I need to extract these names to a new product table. The product table should hold each name currently held in the customer table and be linked to the customer table via a new customer_product table. While the product names may not be unique, they don't have anything to do with each other, meaning for each customer there will need to be inserted 3 new entries into the product table and 3 new entries into the customer_product table.

我需要将这些名称提取到新的产品表中。产品表应包含当前保存在customer表中的每个名称,并通过新的customer_product表链接到customer表。虽然产品名称可能不是唯一的,但它们彼此之间没有任何关系,这意味着每个客户需要在产品表中插入3个新条目,在customer_product表中插入3个新条目。

So instead of this:

所以不是这样的:

customer
| id | product_name_a | product_name_b | product_name_c |

I need this:

我需要这个:

customer
| id |

customer_product
| customer_id | product_id | X3

product
| id | name | X3

I've written the following MYSQL procedure that works:

我编写了以下MYSQL程序:

BEGIN
  DECLARE nbr_of_customers BIGINT(20);
  DECLARE customer_count BIGINT(20);
  DECLARE product_id BIGINT(20);
  DECLARE customer_id BIGINT(20);
  DECLARE product_name_a VARCHAR(500);
  DECLARE product_name_b VARCHAR(500);
  DECLARE product_name_c VARCHAR(500);

  SELECT COUNT(*) FROM customer INTO nbr_of_customers;
  SET customer_count = 0;
  SET product_id = 1;

  WHILE customer_count < nbr_of_customers DO
    SELECT
      customer.id,
      customer.product_name_a,
      customer.product_name_b,
      customer.product_name_c
    INTO
      customer_id,
      product_name_a,
      product_name_b,
      product_name_c
    FROM customer
    LIMIT customer_count,1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_a);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_b);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_c);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    SET customer_count = customer_count + 1;
  END WHILE;
END;

This is too slow.

这太慢了。

I've run this locally and estimate that my ~15k customers would take ~1h to complete. And my VPS server is far slower than that, so it could take upward to 10h to complete.

我在本地运行并估计我的~15k客户需要〜1小时才能完成。而我的VPS服务器远比那要慢,所以可能要花费10到10小时才能完成。

The problem seem to be the inserts taking a long time. I've would therefore like to store all the inserts during the procedure and execute them all in batch after the loop is complete and I know what to insert.

问题似乎是插入需要很长时间。因此,我希望在过程中存储所有插入,并在循环完成后批量执行它们,我知道要插入什么。

I there a way to perform all the ~100k inserts in batch to optimize performance, or is there a better way to do it?

我有办法批量执行所有~100k插入以优化性能,还是有更好的方法来实现它?


FINAL EDIT:

I marked the correct solution based on that it did an excellent job of speeding up the process massively, which was the main focus of the question. In the end I ended up performing the migration using modified production code (in Java), due to the solution's limitations regarding not escaping the inserted strings.

我标记了正确的解决方案,因为它很好地加速了这个过程,这是问题的主要焦点。最后,我最终使用修改后的生产代码(在Java中)执行迁移,因为解决方案的限制是没有转义插入的字符串。

2 个解决方案

#1


1  

First, use a cursor to process the results of a single query, rather than performing a separate query for each row.

首先,使用游标处理单个查询的结果,而不是为每一行执行单独的查询。

Then concatenate the VALUES lists into strings that you execute using PREPARE and EXECUTE.

然后将VALUES列表连接到使用PREPARE和EXECUTE执行的字符串。

My code does the inserts in batches of 100 customers, because I expect there's a limit on the size of a query.

我的代码批量插入100个客户,因为我希望查询的大小有限制。

BEGIN
  DECLARE product_id BIGINT(20);
  DECLARE customer_id BIGINT(20);
  DECLARE product_name_a VARCHAR(500);
  DECLARE product_name_b VARCHAR(500);
  DECLARE product_name_c VARCHAR(500);
  DECLARE done INT DEFAULT FALSE;
  DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  SET product_id = 1;

  OPEN cur;

  SET @product_values = '';
  SET @cp_values = '';

  read_loop: LOOP
    FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
    IF done THEN
      LEAVE read_loop;
    END IF;

    SET @product_values = CONCAT(@product_values, IF(@product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
    SET @cp_values = CONCAT(@cp_values, IF(@cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");

    SET product_id = product_id + 3;

    IF product_id % 300 = 1 -- insert every 100 customers
    THEN BEGIN
         SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
         PREPARE stmt1 FROM @insert_product;
         EXECUTE stmt1;

         SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
         PREPARE stmt2 FROM @insert_cp;
         EXECUTE stmt2;

         SET @product_values = '';
         SET @cp_values = '';
     END IF;

  END LOOP;

  IF @product_values != '' -- Process any remaining rows
  THEN BEGIN
       SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
       PREPARE stmt1 FROM @insert_product;
       EXECUTE stmt1;

       SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
       PREPARE stmt2 FROM @insert_cp;
       EXECUTE stmt2;

       SET @product_values = '';
       SET @cp_values = '';
   END IF;
END;

Beware that, using this solution, the product names will not be properly escaped before inserting. This solution will therefore not work if any of the product names contains special characters, such as single quote '.

请注意,使用此解决方案,在插入之前,产品名称将无法正确转义。因此,如果任何产品名称包含特殊字符(例如单引号),则此解决方案将不起作用。

#2


1  

Perhaps you could do this in three separate inserts (instead of ~100K) as follows:

也许您可以在三个单独的插入(而不是~100K)中执行此操作,如下所示:

INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_a = product.name

INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_b = product.name


INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_c = product.name

Of course, you would have to set up your product table ahead of time, and you'd want to drop your de-normalized columns from your customer table after the fact.

当然,您必须提前设置产品表,并且您希望事后从客户表中删除非规范化列。

This could be further sped up if you create an index on the customer.product_name_X columns (and possibly the product.name column, though it's so few, idk if it would be significant). EXPLAIN can help with that.

如果你在customer.product_name_X列上创建一个索引(可能是product.name列,虽然它很少,但是如果它很重要的话,则可以进一步加速)。 EXPLAIN可以帮助解决这个问题。

#1


1  

First, use a cursor to process the results of a single query, rather than performing a separate query for each row.

首先,使用游标处理单个查询的结果,而不是为每一行执行单独的查询。

Then concatenate the VALUES lists into strings that you execute using PREPARE and EXECUTE.

然后将VALUES列表连接到使用PREPARE和EXECUTE执行的字符串。

My code does the inserts in batches of 100 customers, because I expect there's a limit on the size of a query.

我的代码批量插入100个客户,因为我希望查询的大小有限制。

BEGIN
  DECLARE product_id BIGINT(20);
  DECLARE customer_id BIGINT(20);
  DECLARE product_name_a VARCHAR(500);
  DECLARE product_name_b VARCHAR(500);
  DECLARE product_name_c VARCHAR(500);
  DECLARE done INT DEFAULT FALSE;
  DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

  SET product_id = 1;

  OPEN cur;

  SET @product_values = '';
  SET @cp_values = '';

  read_loop: LOOP
    FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
    IF done THEN
      LEAVE read_loop;
    END IF;

    SET @product_values = CONCAT(@product_values, IF(@product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
    SET @cp_values = CONCAT(@cp_values, IF(@cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");

    SET product_id = product_id + 3;

    IF product_id % 300 = 1 -- insert every 100 customers
    THEN BEGIN
         SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
         PREPARE stmt1 FROM @insert_product;
         EXECUTE stmt1;

         SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
         PREPARE stmt2 FROM @insert_cp;
         EXECUTE stmt2;

         SET @product_values = '';
         SET @cp_values = '';
     END IF;

  END LOOP;

  IF @product_values != '' -- Process any remaining rows
  THEN BEGIN
       SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
       PREPARE stmt1 FROM @insert_product;
       EXECUTE stmt1;

       SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
       PREPARE stmt2 FROM @insert_cp;
       EXECUTE stmt2;

       SET @product_values = '';
       SET @cp_values = '';
   END IF;
END;

Beware that, using this solution, the product names will not be properly escaped before inserting. This solution will therefore not work if any of the product names contains special characters, such as single quote '.

请注意,使用此解决方案,在插入之前,产品名称将无法正确转义。因此,如果任何产品名称包含特殊字符(例如单引号),则此解决方案将不起作用。

#2


1  

Perhaps you could do this in three separate inserts (instead of ~100K) as follows:

也许您可以在三个单独的插入(而不是~100K)中执行此操作,如下所示:

INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_a = product.name

INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_b = product.name


INSERT INTO customer_product (customer_id, product_id) 
SELECT customer.id as customer_id, product.id as product_id 
FROM customer 
  JOIN product on customer.product_name_c = product.name

Of course, you would have to set up your product table ahead of time, and you'd want to drop your de-normalized columns from your customer table after the fact.

当然,您必须提前设置产品表,并且您希望事后从客户表中删除非规范化列。

This could be further sped up if you create an index on the customer.product_name_X columns (and possibly the product.name column, though it's so few, idk if it would be significant). EXPLAIN can help with that.

如果你在customer.product_name_X列上创建一个索引(可能是product.name列,虽然它很少,但是如果它很重要的话,则可以进一步加速)。 EXPLAIN可以帮助解决这个问题。