在ActiveRecord中存储序列化散列与键/值数据库对象的优缺点?

时间:2022-10-17 17:20:09

If I have several objects that each have basically a Profile, what I'm using to store random attributes, what are the pros and cons of:

如果我有几个对象,每个都有一个基本的配置文件,我用什么来存储随机属性,有什么优点和缺点:

  1. Storing a serialized hash in a column for a record, vs.
  2. 将序列化的散列存储在记录的列中,vs。
  3. Storing a bunch of key/value objects that belong_to the main object.
  4. 存储一组从属于主对象的键/值对象。

Code

Say you have STI records like these:

比如你有这样的性病记录:

class Building < ActiveRecord::Base
  has_one :profile, :as => :profilable
end
class OfficeBuilding < Building; end
class Home < Building; end
class Restaurant < Building; end

Each has_one :profile

每个has_one:概要

Option 1. Serialized Hash

class SerializedProfile < ActiveRecord::Base
  serialize :settings
end

create_table :profiles, :force => true do |t|
  t.string   :name
  t.string   :website
  t.string   :email
  t.string   :phone
  t.string   :type
  t.text     :settings
  t.integer  :profilable_id
  t.string   :profilable_type
  t.timestamp
end

Option 2. Key/Value Store

class KeyValueProfile < ActiveRecord::Base
  has_many :settings
end

create_table :profiles, :force => true do |t|
  t.string   :name
  t.string   :website
  t.string   :email
  t.string   :phone
  t.string   :type
  t.integer  :profilable_id
  t.string   :profilable_type
  t.timestamp
end

create_table :settings, :force => true do |t|
  t.string   :key
  t.text     :value
  t.integer  :profile_id
  t.string   :profile_type
  t.timestamp
end

Which would you choose?

你会选择哪一个?

Assume that 99% of the time I won't need to search by the custom settings. Just wondering what the tradeoffs are in terms of performance and the likelihood of future problems. And the number of custom settings will likely be anywhere from 10-50.

假设99%的时间我不需要通过自定义设置进行搜索。只是想知道在性能和未来问题的可能性之间的权衡。自定义设置的数量可能在10-50之间。

I would rather go with the second option, with the settings table, because it follows the ActiveRecord object-oriented conventions. But I'm wondering if in this kind of situation that would come at too high a performance cost.

我宁愿选择第二个选项,使用settings表,因为它遵循ActiveRecord面向对象的约定。但我想知道,在这种情况下,这样做是否会产生过高的性能成本。

Note: I am wondering in terms of RDBMS only. This would be a perfect fit for MongoDB/Redis/CouchDB/etc. but I want to know purely the pros and cons in terms of SQL.

注意:我只对RDBMS感兴趣。这对于MongoDB/Redis/CouchDB/等等来说是一个完美的选择。但我只想知道SQL的利弊。

3 个解决方案

#1


12  

I had the same problem, but finally made the decision.

我遇到了同样的问题,但最终还是做了决定。

Hash serialization option makes maintenance problem. It is hard to query, extend or refactor such data - any subtle change needs migration which means reading each record deserializing and serializing back, and depending on refactoring serialization exception may happen. I tried both binary serialization and JSON - the second is easier to extract and fix but still too much hassle.

散列序列化选项会导致维护问题。查询、扩展或重构此类数据是困难的——任何细微的更改都需要迁移,这意味着要读取每个记录反序列化和反序列化,并且可能会发生重构序列化异常。我尝试了二进制序列化和JSON——第二种方法更容易提取和修复,但仍然有太多的麻烦。

Separate settings table is what I'm trying to use now - much easier to maintain. I plan to use Preferences gem for that which mostly does all abstraction for easy use. I'm not sure if it works with Rails 3 yet - it is small so I can extend it if needed.

单独的设置表是我现在尝试使用的-更容易维护。我计划使用首选项gem来实现所有的抽象,以便于使用。我不确定它是否适用于Rails 3——它很小,所以如果需要我可以扩展它。

Update Nov 2013

2013年11月更新

Recently released Rails 4 supports great new features of PostgreSQL 9.1+ such as hstore or json column types for your dynamic data sets. Here is an article covering hstore usage in Rails 4. Both types support indexing and advanced querying capabilities (Json with Pg 9.3). Hstore is also available to Rails 3 users with activerecord-postgres-hstore gem.

最近发布的Rails 4支持PostgreSQL 9.1+的新特性,比如动态数据集的hstore或json列类型。这里有一篇文章介绍了Rails 4中的hstore用法。这两种类型都支持索引和高级查询功能(Json支持Pg 9.3)。使用activerecord-postgres-hstore gem的Rails 3用户也可以使用Hstore。

I am in the process of migrating some of non critical preference tables in my project to hstores. In migrations I just update table definitions and execute one SQL query per table to move the data.

我正在将项目中的一些非关键首选项表迁移到hstores。在迁移中,我只更新表定义并执行每个表的一个SQL查询来移动数据。

#2


4  

I would recomend just creating a model call Attribute and have each of your objects that need many of them has_many. Then you don't have to mess around with serialization or anything brittle like that. If you use the :join syntax you don't have any real performance issues with this.

我只需要创建一个模型调用属性,并让每个需要它们中的许多的对象都有_many。然后你就不用再用序列化之类的易碎的东西了。如果您使用:join语法,那么您不会遇到任何实际的性能问题。

Serializing data into your RDBMS is almost always unwise. It's more than about queries, it's about the ability to describe and migrate your data (and serialization shatters that ability).

将数据序列化到RDBMS几乎总是不明智的。它不仅仅是关于查询,而是关于描述和迁移数据的能力(以及序列化能力)。

class Building < ActiveRecord::Base
  has_many :attributes
end

class Attribute < ActiveRecord::Base
   belongs_to :building
end

create_table :attributes, :force => true do |t|
  t.integer :building_id
  t.string :att_name
  t.string :data
  t.timestamp
end

#3


2  

I was facing the same dilemma you described and ended up going with the key/value table implementation because of the potential maintenance advantages that others mentioned. It's just easier to think through how I could select and update information in separate rows of the database in a future migration as opposed to a single serialized Hash.

我遇到了您描述的同样的两难境地,由于其他人提到的潜在的维护优势,我最终采用了key/value table实现。在将来的迁移中,与单个序列化散列相比,我可以更容易地考虑如何在数据库的不同行中选择和更新信息。

Another catch I've personally experienced when using a serialized Hash is that you have to be careful that the serialized data you're storing isn't larger than what the DB text field can hold. You can easily end up with missing or corrupted data if you aren't careful. For example, using the SerializedProfile class & table you described, you could cause this behavior:

我个人在使用序列化散列时遇到的另一个问题是,您必须小心,您正在存储的序列化数据不会比DB文本字段所能容纳的数据大。如果不小心,很容易导致丢失或损坏数据。例如,使用您描述的SerializedProfile类和表,您可能会导致以下行为:

profile = SerializedProfile.create(:settings=>{})
100.times{ |i| profile.settings[i] = "A value" }
profile.save!
profile.reload
profile.settings.class #=> Hash
profile.settings.size #=> 100

5000.times{ |i| profile.settings[i] = "A value" }
profile.save!
profile.reload
profile.settings.class #=> String
profile.settings.size #=> 65535

All that code to say, be aware of your DB limits or your serialized data will be clipped the next time it's retrieved and ActiveRecord won't be able to re-serialize it.

所有要说明的代码,要注意您的DB限制或序列化数据,下次检索时将被剪切,ActiveRecord将无法重新序列化它。

For those of you that do want to use a Serialized Hash, go for it! I think it has potential to work well in some cases. I stumbled across the activerecord-attribute-fakers plugin which seems like a good fit.

对于那些希望使用序列化散列的人,请使用它!我认为它有可能在某些情况下很好地发挥作用。我偶然发现了activerecord-attribute-fakers插件,它看起来很适合。

#1


12  

I had the same problem, but finally made the decision.

我遇到了同样的问题,但最终还是做了决定。

Hash serialization option makes maintenance problem. It is hard to query, extend or refactor such data - any subtle change needs migration which means reading each record deserializing and serializing back, and depending on refactoring serialization exception may happen. I tried both binary serialization and JSON - the second is easier to extract and fix but still too much hassle.

散列序列化选项会导致维护问题。查询、扩展或重构此类数据是困难的——任何细微的更改都需要迁移,这意味着要读取每个记录反序列化和反序列化,并且可能会发生重构序列化异常。我尝试了二进制序列化和JSON——第二种方法更容易提取和修复,但仍然有太多的麻烦。

Separate settings table is what I'm trying to use now - much easier to maintain. I plan to use Preferences gem for that which mostly does all abstraction for easy use. I'm not sure if it works with Rails 3 yet - it is small so I can extend it if needed.

单独的设置表是我现在尝试使用的-更容易维护。我计划使用首选项gem来实现所有的抽象,以便于使用。我不确定它是否适用于Rails 3——它很小,所以如果需要我可以扩展它。

Update Nov 2013

2013年11月更新

Recently released Rails 4 supports great new features of PostgreSQL 9.1+ such as hstore or json column types for your dynamic data sets. Here is an article covering hstore usage in Rails 4. Both types support indexing and advanced querying capabilities (Json with Pg 9.3). Hstore is also available to Rails 3 users with activerecord-postgres-hstore gem.

最近发布的Rails 4支持PostgreSQL 9.1+的新特性,比如动态数据集的hstore或json列类型。这里有一篇文章介绍了Rails 4中的hstore用法。这两种类型都支持索引和高级查询功能(Json支持Pg 9.3)。使用activerecord-postgres-hstore gem的Rails 3用户也可以使用Hstore。

I am in the process of migrating some of non critical preference tables in my project to hstores. In migrations I just update table definitions and execute one SQL query per table to move the data.

我正在将项目中的一些非关键首选项表迁移到hstores。在迁移中,我只更新表定义并执行每个表的一个SQL查询来移动数据。

#2


4  

I would recomend just creating a model call Attribute and have each of your objects that need many of them has_many. Then you don't have to mess around with serialization or anything brittle like that. If you use the :join syntax you don't have any real performance issues with this.

我只需要创建一个模型调用属性,并让每个需要它们中的许多的对象都有_many。然后你就不用再用序列化之类的易碎的东西了。如果您使用:join语法,那么您不会遇到任何实际的性能问题。

Serializing data into your RDBMS is almost always unwise. It's more than about queries, it's about the ability to describe and migrate your data (and serialization shatters that ability).

将数据序列化到RDBMS几乎总是不明智的。它不仅仅是关于查询,而是关于描述和迁移数据的能力(以及序列化能力)。

class Building < ActiveRecord::Base
  has_many :attributes
end

class Attribute < ActiveRecord::Base
   belongs_to :building
end

create_table :attributes, :force => true do |t|
  t.integer :building_id
  t.string :att_name
  t.string :data
  t.timestamp
end

#3


2  

I was facing the same dilemma you described and ended up going with the key/value table implementation because of the potential maintenance advantages that others mentioned. It's just easier to think through how I could select and update information in separate rows of the database in a future migration as opposed to a single serialized Hash.

我遇到了您描述的同样的两难境地,由于其他人提到的潜在的维护优势,我最终采用了key/value table实现。在将来的迁移中,与单个序列化散列相比,我可以更容易地考虑如何在数据库的不同行中选择和更新信息。

Another catch I've personally experienced when using a serialized Hash is that you have to be careful that the serialized data you're storing isn't larger than what the DB text field can hold. You can easily end up with missing or corrupted data if you aren't careful. For example, using the SerializedProfile class & table you described, you could cause this behavior:

我个人在使用序列化散列时遇到的另一个问题是,您必须小心,您正在存储的序列化数据不会比DB文本字段所能容纳的数据大。如果不小心,很容易导致丢失或损坏数据。例如,使用您描述的SerializedProfile类和表,您可能会导致以下行为:

profile = SerializedProfile.create(:settings=>{})
100.times{ |i| profile.settings[i] = "A value" }
profile.save!
profile.reload
profile.settings.class #=> Hash
profile.settings.size #=> 100

5000.times{ |i| profile.settings[i] = "A value" }
profile.save!
profile.reload
profile.settings.class #=> String
profile.settings.size #=> 65535

All that code to say, be aware of your DB limits or your serialized data will be clipped the next time it's retrieved and ActiveRecord won't be able to re-serialize it.

所有要说明的代码,要注意您的DB限制或序列化数据,下次检索时将被剪切,ActiveRecord将无法重新序列化它。

For those of you that do want to use a Serialized Hash, go for it! I think it has potential to work well in some cases. I stumbled across the activerecord-attribute-fakers plugin which seems like a good fit.

对于那些希望使用序列化散列的人,请使用它!我认为它有可能在某些情况下很好地发挥作用。我偶然发现了activerecord-attribute-fakers插件,它看起来很适合。