“洗牌”数据库记录表的最佳方法是什么?

时间:2023-01-31 07:38:29

Say that I have a table with a bunch of records, which I want to randomly present to users. I also want users to be able to paginate back and forth, so I have to perserve some sort of order, at least for a while.

假设我有一堆包含大量记录的表,我想将其随机呈现给用户。我还希望用户能够来回分页,所以我必须坚持某种顺序,至少在一段时间内。

The application is basically only AJAX and it uses cache for already visited pages, so even if I always served random results, when the user tries to go back, he will get the previous page, because it will load from the local cache.

该应用程序基本上只是AJAX,它为已访问的页面使用缓存,所以即使我总是提供随机结果,当用户尝试返回时,他将获得上一页,因为它将从本地缓存加载。

The problem is, that if I return only random results, there might be some duplicates. Each page contains 6 results, so to prevent this, I'd have to do something like WHERE id NOT IN (1,2,3,4 ...) where I'd put all the previously loaded IDs.

问题是,如果我只返回随机结果,可能会有一些重复。每个页面包含6个结果,所以为了防止这种情况,我必须做一些像WHERE id NOT IN(1,2,3,4 ...)这样的地方,我把所有以前加载的ID。

Huge downside of that solution is that it won't be possible to cache anything on the server side, as every user will request different data.

该解决方案的巨大缺点是无法在服务器端缓存任何内容,因为每个用户都会请求不同的数据。

Alternate solution might be to create another column for ordering the records, and shuffle it every insert time unit here. The problem here is, I'd need to set random number out of a sequence to every record in the table, which would take as many queries as there are records.

替代解决方案可能是创建另一个列来排序记录,并在此处每次插入时间单位随机播放。这里的问题是,我需要将序列中的随机数设置为表中的每个记录,这将占用与记录一样多的查询。

I'm using Rails and MySQL if that's of any relevance.

如果有任何相关性,我正在使用Rails和MySQL。

3 个解决方案

#1


7  

Try this:

尝试这个:

mysql> create table t (i int);
mysql> insert into t values (1),(2),(3),(4),(5),(6);
mysql> select * from t order by rand(123) limit 2 offset 0;
+------+
| i    |
+------+
|    6 | 
|    4 | 
+------+
mysql> select * from t order by rand(123) limit 2 offset 2;
+------+
| i    |
+------+
|    2 | 
|    3 | 
+------+
mysql> select * from t order by rand(123) limit 2 offset 4;
+------+
| i    |
+------+
|    5 | 
|    1 | 
+------+

Note that the rand() function has a seed value (123). Note also that if you repeat the last three queries you'll get the same result every time.

请注意,rand()函数具有种子值(123)。另请注意,如果重复最后三个查询,则每次都会得到相同的结果。

#2


2  

I would do the following (assuming a sequential, numeric primary key):

我会做以下(假设一个顺序的数字主键):

  1. Generate a random number and store it in the user's session
  2. 生成随机数并将其存储在用户的会话中
  3. When a user pages through the data, query for the total rows
  4. 当用户翻阅数据时,查询总行数
  5. Use the number stored in the session as the seed to generate the same 'random' order of ids on each request
  6. 使用会话中存储的数字作为种子,在每个请求上生成相同的“随机”ids顺序
  7. Page through the ids and only retrieve the records that match those ids from the database.
  8. 通过ID页面,只检索与数据库中的那些ID匹配的记录。

#3


2  

If the random results are "for everyone" rather than any specific user, then you can do something like this: (This is for Postgres, should work with others)

如果随机结果是“为每个人”而不是任何特定用户,那么你可以这样做:(这适用于Postgres,应该与其他人合作)

update mytable set sortorder = random() * 100000000;

select * from mytable order by sortorder, primarykeyid;

Since random MAY duplicate, the secondary sort by primarykeyid gives the sort some stability.

由于随机可以重复,因此通过primarykeyid进行的二级排序可以使排序具有一定的稳定性

You can then do this as often as you want to refresh your cache. For example, give you pages an absolute expiration of, say, every minute. Then every minute you reupdate the sort order and serve pages up normally.

然后,您可以根据需要刷新缓存来执行此操作。例如,给你的页面绝对过期,比方说,每分钟。然后每分钟重新更新排序顺序并正常提供页面。

If you get requests across the refresh window, then, yea, you have a chance of having different pages getting the same results. You will also have the issue of when they hit "back" they may well not get the page that they had before (since it refreshed).

如果您在刷新窗口中收到请求,那么,您有可能让不同的页面获得相同的结果。你也会遇到一个问题,当他们点击“返回”时,他们可能无法获得他们之前拥有的页面(因为它已经刷新)。

Kind of comes down to what the motivation behind the presentation of random data is as to how well this will work. It also depends on data volume, etc.

可归结为随机数据的呈现背后的动机是什么,这将是如何工作的。它还取决于数据量等。

But this is a cache friendly way of pulling this off, if that's important to you. It's also stateless (no session information needed).

但是,如果这对你很重要,这是一种缓存友好的方式。它也是无状态的(不需要会话信息)。

#1


7  

Try this:

尝试这个:

mysql> create table t (i int);
mysql> insert into t values (1),(2),(3),(4),(5),(6);
mysql> select * from t order by rand(123) limit 2 offset 0;
+------+
| i    |
+------+
|    6 | 
|    4 | 
+------+
mysql> select * from t order by rand(123) limit 2 offset 2;
+------+
| i    |
+------+
|    2 | 
|    3 | 
+------+
mysql> select * from t order by rand(123) limit 2 offset 4;
+------+
| i    |
+------+
|    5 | 
|    1 | 
+------+

Note that the rand() function has a seed value (123). Note also that if you repeat the last three queries you'll get the same result every time.

请注意,rand()函数具有种子值(123)。另请注意,如果重复最后三个查询,则每次都会得到相同的结果。

#2


2  

I would do the following (assuming a sequential, numeric primary key):

我会做以下(假设一个顺序的数字主键):

  1. Generate a random number and store it in the user's session
  2. 生成随机数并将其存储在用户的会话中
  3. When a user pages through the data, query for the total rows
  4. 当用户翻阅数据时,查询总行数
  5. Use the number stored in the session as the seed to generate the same 'random' order of ids on each request
  6. 使用会话中存储的数字作为种子,在每个请求上生成相同的“随机”ids顺序
  7. Page through the ids and only retrieve the records that match those ids from the database.
  8. 通过ID页面,只检索与数据库中的那些ID匹配的记录。

#3


2  

If the random results are "for everyone" rather than any specific user, then you can do something like this: (This is for Postgres, should work with others)

如果随机结果是“为每个人”而不是任何特定用户,那么你可以这样做:(这适用于Postgres,应该与其他人合作)

update mytable set sortorder = random() * 100000000;

select * from mytable order by sortorder, primarykeyid;

Since random MAY duplicate, the secondary sort by primarykeyid gives the sort some stability.

由于随机可以重复,因此通过primarykeyid进行的二级排序可以使排序具有一定的稳定性

You can then do this as often as you want to refresh your cache. For example, give you pages an absolute expiration of, say, every minute. Then every minute you reupdate the sort order and serve pages up normally.

然后,您可以根据需要刷新缓存来执行此操作。例如,给你的页面绝对过期,比方说,每分钟。然后每分钟重新更新排序顺序并正常提供页面。

If you get requests across the refresh window, then, yea, you have a chance of having different pages getting the same results. You will also have the issue of when they hit "back" they may well not get the page that they had before (since it refreshed).

如果您在刷新窗口中收到请求,那么,您有可能让不同的页面获得相同的结果。你也会遇到一个问题,当他们点击“返回”时,他们可能无法获得他们之前拥有的页面(因为它已经刷新)。

Kind of comes down to what the motivation behind the presentation of random data is as to how well this will work. It also depends on data volume, etc.

可归结为随机数据的呈现背后的动机是什么,这将是如何工作的。它还取决于数据量等。

But this is a cache friendly way of pulling this off, if that's important to you. It's also stateless (no session information needed).

但是,如果这对你很重要,这是一种缓存友好的方式。它也是无状态的(不需要会话信息)。