为本地ruby on rails开发复制大型数据库?

There is a massive database (GB) that I am working with now and all of the previous development has been done on a slicehost slice. I am trying to get ready for more developers to come in and work so I need each person to be able to setup his own machine for development, which means potentially copying this database. Selecting only the first X rows in each table to cut size could be problematic for data consistency. Is there any way around this, or is a 1 hour download for each developer going to be necessary? And beyond that, what if I need to copy the production DB down for dev purposes in the future?

我现在正在处理一个庞大的数据库(GB)，以前所有的开发都是在slicehost片上完成的。我正试图让更多的开发人员进来工作，所以我需要每个人都能够安装自己的机器进行开发，这意味着可能会复制这个数据库。只选择每个表中的前X行以减少大小可能会对数据一致性产生问题。有没有办法解决这个问题，或者每个开发人员需要下载一个小时?除此之外，如果我需要为将来的开发目的复制生产DB怎么办?

Sincerely, Tyler

真诚,泰勒

4 个解决方案

#1

Why not have a dev server that each dev connects to?

为什么不让每个开发人员都连接到一个开发服务器呢?

Yes all devs develop against the same database. No developement is ever done excpt through scripts that are checked into Subversion. If a couple of people making changes run into each other, all the better that they find out as soon as possible that they are doing things which might conflict.

是的，所有的devs都是针对同一个数据库开发的。任何开发都不会通过检入Subversion的脚本执行。如果两个做出改变的人相遇，越快发现他们做的事情可能会产生冲突，那就越好。

We also periodically load a prod backup to dev and rerun any scripts for things which have not yet been loaded to prod to keep out data up-to-date. Developing against the full data set is critical once you have a medium sized database because the coding techniques which appear to be fine to a dev on a box by himself with a smaller dataset, will often fail misreably against prod sized data and when there are multiple users.

我们还定期向dev加载一个prod备份，并重新运行任何尚未加载到prod的脚本，以保持数据的最新状态。当您拥有一个中等大小的数据库时，针对完整的数据集进行开发是至关重要的，因为对于一个使用较小数据集的开发人员来说，编码技术似乎是可以接受的，但对于prod大小的数据以及当有多个用户时，这些编码技术往往会失败。

#2

databases required for development and testing rarely need to be full size, it is often easier to work on a small copy. A database subsetting tool like Jailer ( http://jailer.sourceforge.net/ ) might help you here.

开发和测试所需要的数据库很少需要完全的大小，所以在小的副本上工作通常比较容易。像Jailer (http://jailer.sourceforge.net/)这样的数据库子设置工具可能在这里对您有所帮助。

#3

To make downloading the production database more efficient, be sure you're compressing it as much as possible before transmission, and further, that you're stripping out any records that aren't relevant for development work.

为了使下载生产数据库更有效，请确保在传输之前尽可能压缩它，并进一步确保删除与开发工作无关的任何记录。

You can also create a patch against an older version of your database dump to ship over only the differences and not an entirely new copy of it. This works best when each INSERT statement is recorded one per line, something that may need to be engaged on your tool specifically. With MySQL this is the --skip-extended-insert option.

您还可以针对数据库转储的旧版本创建一个补丁，以便只交付差异，而不交付一个全新的副本。当每一行都记录一条INSERT语句时，这是最有效的，这可能需要在您的工具上特别关注。对于MySQL，这是-skip- extension -insert选项。

A better approach is to have a fake data generator that can roll out a suitably robust version of the database for testing and development. This is not too hard to do with things like Factory Girl which can automate routine record creation.

更好的方法是使用一个假的数据生成器，该生成器可以为测试和开发推出一个适当健壮的数据库版本。这对于工厂女孩之类的东西来说并不难，工厂女孩可以自动创建常规的记录。

#4

In case anyone's interested in an answer to the question of "how do I copy data between databases", I found this:

如果有人对“我如何在数据库中复制数据”的问题感兴趣，我发现:

http://justbarebones.blogspot.com/2007/10/copy-model-data-between-databases.html

It answered the question I asked when I found this S.O. question.

它回答了我在发现这个问题时问的问题。

#1