用于大型数据集的MySQL数据库建模

时间:2021-12-03 16:26:08

A client wants to compile a bunch of data for his customers from a bunch of different sources. I'm building on a PHP/MySQL server architecture. All of my experience is in front-end dev and design, so I'm running into problems with performance, now that there are a lot of data sets.

客户想要从一堆不同的来源为他的客户编译一堆数据。我正在构建一个PHP / MySQL服务器架构。我的所有经验都是前端开发和设计,所以我遇到了性能问题,现在有很多数据集。

The performance problem is de-duplication. The main db table stores domains and has four columns: 'id', 'domain_name', and two booleans used to determine whether or not a domain is a possible target for the customers. There is an INDEX on the 'domain_name' column.

性能问题是重复数据删除。主db表存储域并有四列:'id','domain_name'和两个布尔值,用于确定域是否是客户的可能目标。 “domain_name”列上有一个INDEX。

I don't want multiple rows for the same domain. The domains arrive in sets of 30,000, and right now I am using:

我不希望同一个域有多行。这些域名以30,000的形式到达,现在我正在使用:

if(!(Domain::find_by_domain($d->n))) {
    // insert into db
}

I've also tried:

我也尝试过:

$already_in_db = Domain::list_domains();
if(!in_array($already_in_db)) {
    // insert into db
}

There are only about 170,000 domains in the table right now, and both methods already take an extremely long time.

目前表中只有大约170,000个域,这两种方法都需要很长时间。

1) Will setting a UNIQUE INDEX on the domain column cause dupes to just be discarded?

1)在域列上设置UNIQUE INDEX会导致欺骗被丢弃吗?

2) Are there any other methods to speed up this process?

2)还有其他方法可以加快这个过程吗?

1 个解决方案

#1


2  

Make your index on the domain name column UNIQUE, then your INSERT statements will fail if the domain already exists (you can use REPLACE or INSERT ... ON DUPLICATE KEY UPDATE if you want to change the data in the event of such a collision):

在域名列UNIQUE上创建索引,如果域已存在,则INSERT语句将失败(如果要在发生此类冲突时更改数据,则可以使用REPLACE或INSERT ... ON DUPLICATE KEY UPDATE) :

ALTER TABLE tbl_name
  DROP INDEX name_of_existing_index,
  ADD  UNIQUE INDEX name_of_existing_index (domain_name);

#1


2  

Make your index on the domain name column UNIQUE, then your INSERT statements will fail if the domain already exists (you can use REPLACE or INSERT ... ON DUPLICATE KEY UPDATE if you want to change the data in the event of such a collision):

在域名列UNIQUE上创建索引,如果域已存在,则INSERT语句将失败(如果要在发生此类冲突时更改数据,则可以使用REPLACE或INSERT ... ON DUPLICATE KEY UPDATE) :

ALTER TABLE tbl_name
  DROP INDEX name_of_existing_index,
  ADD  UNIQUE INDEX name_of_existing_index (domain_name);