MySQL千万级别表数据中提高RAND随机查询的实验

时间:2022-05-19 23:18:41

1,准备测试数据

 

1.1 建库建表

mysql> create database hwdb;

Query OK, 1 row affected (0.34 sec)

 

mysql>

mysql> show create table t_huawei;

+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table   | Create Table                                                                                                                                                                                                                                       |

+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| t_huawei | CREATE TABLE `t_huawei` (

 `hwid` int(11) DEFAULT NULL,

 `content` varchar(4000) DEFAULT NULL,

 `TYPE` char(2) DEFAULT NULL,

 `creator_id` varchar(30) DEFAULT NULL,

 `create_time` datetime DEFAULT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

 

mysql>

 

1.2 准备存储过程

DELIMITER $$

 

USE `hwdb`$$

 

DROP PROCEDURE IF EXISTS `pro_insert_hw`$$

 

CREATE DEFINER=`root`@`localhost` PROCEDURE`pro_insert_hw`(num INT)

BEGIN

    DECLARE var1 INT DEFAULT 0;   

   WHILE var1<num DO   

   SET var1=var1+1;   

  INSERT INTO hwdb.t_huawei(hwid,content,TYPE,creator_id,create_time)  

   VALUES (var1,CONCAT(var1,' - ','Players supposedly said they''d avoidteam facility this summer if Thibodeau remained the '),

0,10013,NOW());   

   END WHILE;    

 

 

   END$$

 

DELIMITER ;

 

 

mysql> show create table t_huawei;

+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table   | Create Table                                                                                                                                                                                                                                       |

+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| t_huawei | CREATE TABLE `t_huawei` (

 `hwid` int(11) DEFAULT NULL,

 `content` varchar(4000) DEFAULT NULL,

 `TYPE` char(2) DEFAULT NULL,

  `creator_id`varchar(30) DEFAULT NULL,

 `create_time` datetime DEFAULT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

 

mysql>

 

1.3 开始录入数据

因为要录入千万级别数据,所以用后台mysql进程来录入,如下所示:

nohup /usr/local/mysql/bin/mysql -uroot--password="" -S /usr/local/mysql/mysql.sock -e "select now();truncate table hwdb.t_huawei; call hwdb.pro_insert_hw(10000000); select now();" > /data/call.log &

 

这里因为测试环境机器性能有限,在录入到7023850条的时候,我终止了进程,开始准备测试工作:

mysql> SELECT COUNT(1) FROM `t_huawei` ;

+----------+

| COUNT(1) |

+----------+

| 7023850 |

+----------+

1 row in set (5.52 sec)

 

mysql>

 

 

2,开始测试

 

 

2.1 普通的ORDER BY RAND()测试很慢很慢,如下所示,需要块1分钟的时间

mysql> SELECT  SQL_NO_CACHE * FROM `t_huawei` AS t1  ORDER BY RAND() LIMIT 5;

+---------+-----------------------------------------------------------------------------------------------------+------+------------+---------------------+

| hwid   | content                                                                                            | TYPE |creator_id | create_time         |

+---------+-----------------------------------------------------------------------------------------------------+------+------------+---------------------+

| 3743323 | 3743323 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:43:01|

| 2418491 | 2418491 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:19:31|

| 1224667 | 1224667 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:58:35|

| 1639270 | 1639270 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:05:50|

| 2756470 | 2756470 - Players supposedlysaid they'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 20:25:28|

+---------+-----------------------------------------------------------------------------------------------------+------+------------+---------------------+

5 rows in set (54.92 sec)

 

mysql>

 

 

原因是:MYSQL的官方手册,里面针对RAND()的提示大概意思就是,在ORDER BY从句里面不能使用RAND()函数,因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中,仍然可以通过ORDER BY RAND()来实现随机。

 

 

2.2 JOIN中使用RAND()* (SELECTMAX(hwid) 的方式来实现

mysql> SELECT SQL_NO_CACHE t1.* FROM`t_huawei` AS t1 LEFT JOIN(SELECT ROUND(RAND()* (SELECT MAX(hwid) FROM  `t_huawei` )) AS id ) AS t2 ONt1.`hwid`>=t2.id ORDER BY t1.`hwid` ASC LIMIT 5;

+------+-----------------------------------------------------------------------------------------------+------+------------+---------------------+

| hwid | content                                                                                      | TYPE | creator_id | create_time        |

+------+-----------------------------------------------------------------------------------------------+------+------------+---------------------+

|   1 | 1 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |

|   2 | 2 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |

|   3 | 3 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |

|   4 | 4 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |

|   5 | 5 - Players supposedly said they'd avoid team facility this summerif Thibodeau remained the  | 0    | 10013     | 2015-06-08 19:37:02 |

+------+-----------------------------------------------------------------------------------------------+------+------------+---------------------+

5 rows in set (50.04 sec)

 

mysql>

需要50面,比起直接的order by来说少了4秒,但是还是比较耗时的。

 

 

2.3 WHERE子句的方式来优化

mysql> SELECT SQL_NO_CACHE t1.* FROMhwdb.`t_huawei` t1 WHERE t1.`hwid` >=(SELECT FLOOR(RAND() * (SELECTMAX(t2.hwid) FROM hwdb.`t_huawei` t2 ) )) ORDER BY t1.hwid LIMIT 5;

+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+

| hwid | content                                                                                         | TYPE | creator_id | create_time        |

+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+

| 1624 | 1624 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:05|

| 6076 | 6076 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:11|

| 7988 | 7988 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:14|

| 8016 | 8016 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:14|

| 8106 | 8106 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:14|

+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+

5 rows in set (7.25 sec)

 

mysql>

 

花了7.25秒,比上一种要快的多很多了。

 

2.4 MAX(hwid)- MIN(hwid)的方式来RAND来操作

mysql> SELECT SQL_NO_CACHE * FROMhwdb.`t_huawei` WHERE hwid >= (SELECT FLOOR(RAND() * ((SELECT MAX(hwid) FROMhwdb.`t_huawei`) - (SELECT MIN(hwid) FROM hwdb.`t_huawei`)) + (SELECT MIN(hwid)FROM hwdb.`t_huawei`))) ORDER BY hwid LIMIT 5;

+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+

| hwid | content                                                                                          |TYPE | creator_id | create_time         |

+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+

| 5438 | 5438 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:10|

| 5883 | 5883 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:10|

| 6782 | 6782 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:12|

| 7797 | 7797 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:13|

| 8625 | 8625 - Players supposedly saidthey'd avoid team facility this summer if Thibodeau remained the  | 0   | 10013      | 2015-06-08 19:37:15|

+------+--------------------------------------------------------------------------------------------------+------+------------+---------------------+

5 rows in set (13.50 sec)

 

mysql>

MAX(hwid)- MIN(hwid)的方式来RAND来操作需要13.5秒,比where子句花时间要多一些。


3,总结

WHERE子句的方式效率最高,样例sql:SELECTSQL_NO_CACHE t1.* FROM hwdb.`t_huawei` t1 WHERE t1.`hwid` >=(SELECTFLOOR(RAND() * (SELECT MAX(t2.hwid) FROM hwdb.`t_huawei` t2 ) )) ORDER BYt1.hwid LIMIT 5;