如何改进这一对多的ActiveRecord数据模型?

时间:2022-10-04 15:22:58

I have a home-grown (not my own) versioning system with the following data structure:

我有一个本土的(不是我自己的)版本控制系统,具有以下数据结构:

  create_table "activities", :force => true do |t|
    t.string   "source"
    t.datetime "created_at",       :null => false
    t.datetime "updated_at",       :null => false
    t.integer  "head_revision_id"
  end

  add_index "activities", ["head_revision_id"], :name => "index_activities_on_head_revision_id"
  add_index "activities", ["source"], :name => "index_activities_on_source"

  create_table "activity_revisions", :force => true do |t|
    t.integer  "activity_id"
    t.string   "activity_type"
    t.string   "title"
    t.text     "content"
    t.text     "comment"
    t.integer  "modified_by_id"
    t.datetime "created_at",                      :null => false
    t.datetime "updated_at",                      :null => false
  end

  add_index "activity_revisions", ["activity_id"], :name => "index_activity_revisions_on_activity_id"
  add_index "activity_revisions", ["activity_type"], :name => "index_activity_revisions_on_activity_type"
  add_index "activity_revisions", ["title"], :name => "index_activity_revisions_on_title"

The application displays a list of activities from newest to oldest, paginated (will_paginate) 20 to a page. This is the query used to generate the list:

应用程序显示从最新到最旧,分页(will_paginate)20到页面的活动列表。这是用于生成列表的查询:

Activity.where(conditions)
        .joins(:head_revision)
        .includes(:head_revision)
        .order('activities.id DESC')

The conditions vary according the values passed from a search form. For the initial list display, conditions is blank.

条件因搜索表单中传递的值而异。对于初始列表显示,条件为空白。

On the surface, this query is simple enough but in execution, it is horribly slow with large data sets. We currently have about 102,000 activity records and 512,000 activity_revision records. On our production server, the query takes nearly 2 seconds to provide a count. In a development environment, it is abysmal.

从表面上看,这个查询很简单,但在执行时,大数据集的速度非常慢。我们目前有大约102,000个活动记录和512,000个activity_revision记录。在我们的生产服务器上,查询需要将近2秒钟来提供计数。在开发环境中,它非常糟糕。

I feel that there is something inherently wrong with the data model and I'm hoping someone can show me a better way.

我觉得数据模型本身存在一些错误,我希望有人能给我一个更好的方法。

EDIT: Explain run on the basic query without conditions:

编辑:无条件地解释在基本查询上运行:

mysql> explain SELECT * FROM `activities`  INNER JOIN `activity_revisions` ON `activity_revisions`.`id` = `activities`.`head_revision_id`;
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
| id | select_type | table              | type   | possible_keys                        | key     | key_len | ref                                        | rows   | Extra |
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
|  1 | SIMPLE      | activities         | ALL    | index_activities_on_head_revision_id | NULL    | NULL    | NULL                                       | 106590 |       |
|  1 | SIMPLE      | activity_revisions | eq_ref | PRIMARY                              | PRIMARY | 4       | cms_production.activities.head_revision_id |      1 |       |
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
2 rows in set (0.00 sec)

and on the count(*) query:

并在count(*)查询:

mysql> explain SELECT count(*) FROM `activities`  INNER JOIN `activity_revisions` ON `activity_revisions`.`id` = `activities`.`head_revision_id`;
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+-------------     +
| id | select_type | table              | type   | possible_keys                        | key                                  | key_len | ref                                        | rows   | Extra            |
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+-------------     +
|  1 | SIMPLE      | activities         | index  | index_activities_on_head_revision_id | index_activities_on_head_revision_id | 5       | NULL                                       | 106590 | Using index      |
|  1 | SIMPLE      | activity_revisions | eq_ref | PRIMARY                              | PRIMARY                              | 4       | cms_production.activities.head_revision_id |      1 | Using index      |
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+-------------     +
2 rows in set (0.00 sec)

2 个解决方案

#1


0  

I see you are indexing several columns already which is good. I would say one of the best ways you can ensure your queries are as efficient as possible is to make sure ALL of the conditions that deal with querying/retrieval in the db, have their corresponding columns indexed.

我看到你已经索引了几个已经很好的列。我想说,确保查询尽可能高效的最佳方法之一是确保在数据库中处理查询/检索的所有条件都将其相应的列编入索引。

#2


0  

Guessing why a query is slow is no fun, fortunately we shouldn't have to.

猜测为什么查询缓慢并不好玩,幸运的是我们不应该这样做。

Take a look at http://guides.rubyonrails.org/active_record_querying.html#running-explain and let's see what your Activity queries are actually doing.

看看http://guides.rubyonrails.org/active_record_querying.html#running-explain,让我们看看您的Activity查询实际上在做什么。

It sounds like you're querying a mysql database so take a look at the key in that those explain results. As MilesStanfield suggested, it sounds like you'll see that you are not using an index effectively.

听起来你正在查询一个mysql数据库,所以看看那些解释结果的关键。正如MilesStanfield所说,听起来你会发现你没有有效地使用索引。

#1


0  

I see you are indexing several columns already which is good. I would say one of the best ways you can ensure your queries are as efficient as possible is to make sure ALL of the conditions that deal with querying/retrieval in the db, have their corresponding columns indexed.

我看到你已经索引了几个已经很好的列。我想说,确保查询尽可能高效的最佳方法之一是确保在数据库中处理查询/检索的所有条件都将其相应的列编入索引。

#2


0  

Guessing why a query is slow is no fun, fortunately we shouldn't have to.

猜测为什么查询缓慢并不好玩,幸运的是我们不应该这样做。

Take a look at http://guides.rubyonrails.org/active_record_querying.html#running-explain and let's see what your Activity queries are actually doing.

看看http://guides.rubyonrails.org/active_record_querying.html#running-explain,让我们看看您的Activity查询实际上在做什么。

It sounds like you're querying a mysql database so take a look at the key in that those explain results. As MilesStanfield suggested, it sounds like you'll see that you are not using an index effectively.

听起来你正在查询一个mysql数据库,所以看看那些解释结果的关键。正如MilesStanfield所说,听起来你会发现你没有有效地使用索引。