在数据库中创建测试数据

时间:2022-07-28 16:30:07

I'm aware of some of the test data generators out there, but most seem to just fill name and address style databases [feel free to correct me].

我知道有一些测试数据生成器,但大多数似乎只是填写名称和地址风格数据库(请随意纠正我)。

We have a large integrated and normalised application - e.g. invoices have part numbers linked to stocking tables, customer numbers linked to customer tables, change logs linked to audit information, etc which are obviously difficult to fill randomly. Currently we obfuscate real life data to get test data (but not very well).

我们有一个大型集成和规范化的应用程序——例如,发票上有与备货表相关的零件编号,与客户表相关的客户编号,与审计信息相关的变更日志,等等,这些显然很难随机填写。目前我们混淆现实数据以获得测试数据(但不是很好)。

What tools\methods do you use to create large volumes of data to test with?

您使用什么工具来创建大量数据以进行测试?

6 个解决方案

#1


7  

Where I work we use RedGate Data Generator to generate test data.

在我工作的地方,我们使用RedGate数据生成器生成测试数据。

Since we work in the banking domain. When we have to work with nominative data (Credit card numbers, personnal ID, phone numbers) we developed an application that can mask these database fields so we can work with them as real data.

因为我们在银行领域工作。当我们需要使用名称数据(信用卡号、个人ID、电话号码)时,我们开发了一个应用程序,可以屏蔽这些数据库字段,以便我们可以将它们作为真实数据使用。

I can say with Redgate you can get close to what your real data can look like on a production server since you can customize every field of every table in your BD.

我可以说,通过Redgate,您可以接近生产服务器上的真实数据,因为您可以定制BD中的每个表的每个字段。

#2


3  

You can generate data plans with VSTS Database Edition (with the latest 2008 Power tools).

您可以使用VSTS数据库版本(使用最新的2008 Power工具)生成数据计划。

It includes a Data Generation Wizard which allows automated data generation by pointing to an existing database so you get something that is realistic but contains entirely different data

它包括一个数据生成向导,该向导允许通过指向一个现有的数据库来自动生成数据,这样您就可以获得一些实际的但包含完全不同的数据

#3


3  

I've rolled my own data generator that generates random data conforming to regular expressions. The basic idea is to use validation rules twice. First you use them to generate valid random data and then you use them to validate new input in production. I've stated a rewrite of the utility as it seems like a nice learning project. It's available at googlecode.

我已经滚动了自己的数据生成器,生成符合正则表达式的随机数据。基本思想是两次使用验证规则。首先使用它们生成有效的随机数据,然后使用它们验证生产中的新输入。我已经对这个实用程序进行了重写,因为它看起来是一个很好的学习项目。这是在googlecode可用。

#4


2  

I just completed a project creating 3,500,000+ health insurance claim lines. Due to HIPPA and PHI restrictions, using even scrubbed real data is a PITA. I used a tool called Datatect for this (http://www.datatect.com/).

我刚刚完成了一个项目,创建了350万个健康保险索赔线。由于HIPPA和PHI的限制,使用甚至删除的真实数据都是PITA。为此,我使用了一个名为Datatect的工具(http://www.datatect.com/)。

Some of the things I like about this tool:

我喜欢这个工具的一些地方:

  1. Uses ODBC so you can generate data into any ODBC data source. I've used this for Oracle, SQL and MS Access databases, flat files, and Excel spreadsheets.
  2. 使用ODBC可以将数据生成到任何ODBC数据源。我已经将它用于Oracle、SQL和MS访问数据库、平面文件和Excel电子表格。
  3. Extensible via VBScript. You can write hooks at various parts of the data generation workflow to extend the abilities of the tool. I used this feature to "sync up" dependent columns in the database, and to control the frequency distribution of values to align with real world observed frequencies.
  4. 通过改变可扩展。您可以在数据生成工作流的各个部分编写钩子来扩展工具的功能。我使用这个特性来“同步”数据库中的相关列,并控制值的频率分布,以与实际观察到的频率保持一致。
  5. Referentially aware. When populating foreign key columns, pulls valid keys from parent table.
  6. 并意识到。填充外键列时,从父表中提取有效键。

#5


1  

The Red Gate product is good...but not perfect.

红门产品很好……但并不是完美的。

I found that I did better when I wrote my own tools to generate the data. I use it when I want to generate say Customers...but it's not great if you wanted to simulate randomness that customers might engage in like creating orders...some with one item some with multiple items.

我发现自己编写工具生成数据时做得更好。当我想要产生所谓的客户时,我就会使用它。但如果你想要模拟客户可能参与的随机事件,那就不太好了……一些有一个项目,一些有多个项目。

Homegrown tools will provide the most 'realistic' data I think.

我认为,本土工具将提供最“真实”的数据。

#6


0  

Joel also mentioned RedGate in podcast #11

乔尔在播客#11中也提到了瑞吉特

#1


7  

Where I work we use RedGate Data Generator to generate test data.

在我工作的地方,我们使用RedGate数据生成器生成测试数据。

Since we work in the banking domain. When we have to work with nominative data (Credit card numbers, personnal ID, phone numbers) we developed an application that can mask these database fields so we can work with them as real data.

因为我们在银行领域工作。当我们需要使用名称数据(信用卡号、个人ID、电话号码)时,我们开发了一个应用程序,可以屏蔽这些数据库字段,以便我们可以将它们作为真实数据使用。

I can say with Redgate you can get close to what your real data can look like on a production server since you can customize every field of every table in your BD.

我可以说,通过Redgate,您可以接近生产服务器上的真实数据,因为您可以定制BD中的每个表的每个字段。

#2


3  

You can generate data plans with VSTS Database Edition (with the latest 2008 Power tools).

您可以使用VSTS数据库版本(使用最新的2008 Power工具)生成数据计划。

It includes a Data Generation Wizard which allows automated data generation by pointing to an existing database so you get something that is realistic but contains entirely different data

它包括一个数据生成向导,该向导允许通过指向一个现有的数据库来自动生成数据,这样您就可以获得一些实际的但包含完全不同的数据

#3


3  

I've rolled my own data generator that generates random data conforming to regular expressions. The basic idea is to use validation rules twice. First you use them to generate valid random data and then you use them to validate new input in production. I've stated a rewrite of the utility as it seems like a nice learning project. It's available at googlecode.

我已经滚动了自己的数据生成器,生成符合正则表达式的随机数据。基本思想是两次使用验证规则。首先使用它们生成有效的随机数据,然后使用它们验证生产中的新输入。我已经对这个实用程序进行了重写,因为它看起来是一个很好的学习项目。这是在googlecode可用。

#4


2  

I just completed a project creating 3,500,000+ health insurance claim lines. Due to HIPPA and PHI restrictions, using even scrubbed real data is a PITA. I used a tool called Datatect for this (http://www.datatect.com/).

我刚刚完成了一个项目,创建了350万个健康保险索赔线。由于HIPPA和PHI的限制,使用甚至删除的真实数据都是PITA。为此,我使用了一个名为Datatect的工具(http://www.datatect.com/)。

Some of the things I like about this tool:

我喜欢这个工具的一些地方:

  1. Uses ODBC so you can generate data into any ODBC data source. I've used this for Oracle, SQL and MS Access databases, flat files, and Excel spreadsheets.
  2. 使用ODBC可以将数据生成到任何ODBC数据源。我已经将它用于Oracle、SQL和MS访问数据库、平面文件和Excel电子表格。
  3. Extensible via VBScript. You can write hooks at various parts of the data generation workflow to extend the abilities of the tool. I used this feature to "sync up" dependent columns in the database, and to control the frequency distribution of values to align with real world observed frequencies.
  4. 通过改变可扩展。您可以在数据生成工作流的各个部分编写钩子来扩展工具的功能。我使用这个特性来“同步”数据库中的相关列,并控制值的频率分布,以与实际观察到的频率保持一致。
  5. Referentially aware. When populating foreign key columns, pulls valid keys from parent table.
  6. 并意识到。填充外键列时,从父表中提取有效键。

#5


1  

The Red Gate product is good...but not perfect.

红门产品很好……但并不是完美的。

I found that I did better when I wrote my own tools to generate the data. I use it when I want to generate say Customers...but it's not great if you wanted to simulate randomness that customers might engage in like creating orders...some with one item some with multiple items.

我发现自己编写工具生成数据时做得更好。当我想要产生所谓的客户时,我就会使用它。但如果你想要模拟客户可能参与的随机事件,那就不太好了……一些有一个项目,一些有多个项目。

Homegrown tools will provide the most 'realistic' data I think.

我认为,本土工具将提供最“真实”的数据。

#6


0  

Joel also mentioned RedGate in podcast #11

乔尔在播客#11中也提到了瑞吉特