SQL:多对多表和查询

时间:2022-10-05 12:12:35

First - apologies for the fuzzy title, I could not find a better one.

首先,为这个模糊的标题道歉,我找不到更好的标题。

I have table with the following structure (simplification):

我有以下结构的表格(简化):

EmpID DeptID

1     1
1     2
2     1
3     2
4     5
5     2

This table represents a many-to-many relationship.

此表表示多对多关系。

I'm interested in finding all the EmpIDs that are related to a specific group of DeptIDs, for example I want all the EmpIDs that are related to DeptIDs 1, 2 and 3. Please note it's an AND relationship and not an OR relationship. For my case, the EmpID may be related to additional DeptIDs besides 1, 2 and 3 for it to be a valid answer.

我感兴趣的是找到所有与特定的DeptIDs相关的经验性,例如,我想要所有与DeptIDs 1 2和3相关的经验性。请注意这是一种关系,而不是一种关系。就我的情况而言,除1、2和3以外,EmpID可能与其他DeptIDs有关,因此它是一个有效的答案。

The number of DeptIDs I'm interested in changes (i.e. I may want EmpIDs who're related to both DeptID 3 and 5, or I may want EmpIDs related to DepIDs 2, 3, 4, 5, 6, 7).

我对变化感兴趣的DeptIDs的数量(也就是说,我可能需要与DeptID 3和5相关的经验,或者我可能需要与DepIDs 2、3、4、5、6、7)相关的经验。

When I try to approach this problem I find myself either creating a JOIN per DepID, or a subquery per DeptID. This would mean I have to generate a new query per the number of DeptIDs I'm testing against. I would obviously prefer having a static query with a parameter or set of parameters.

当我尝试解决这个问题时,我发现自己要么创建一个每个DepID的联接,要么创建一个每个DeptID的子查询。这意味着我必须根据所测试的DeptIDs的数量生成一个新的查询。显然,我更喜欢使用带有参数或参数集的静态查询。

I'm working over both SQL Server and MySQL (developing in parallel two versions of my code).

我正在开发SQL Server和MySQL(并行开发我的代码的两个版本)。

Any ideas?

什么好主意吗?

2 个解决方案

#1


14  

I'm assuming you want to find employees that are in ALL of the specified departments and not just the employees that are in ANY of the departments, which is a far easier query.

我假设你想要找到所有指定部门的员工而不仅仅是任何部门的员工,这是一个简单得多的查询。

SELECT EmpID
FROM mytable t1
JOIN mytable t2 ON t1.EmpID = t2.EmpID AND t2.DeptID = 2
JOIN mytable t3 ON t2.EmpID = t3.EmpID AND t3.DeptID = 3
WHERE DeptID = 1

I'm going to preempt the inevitable suggestion that'll come to use aggregation:

我要先提出一个不可避免的建议来使用聚合:

SELECT EmpID
FROM mytable
WHERE DeptID IN (1,2,3)
GROUP BY EmpID
HAVING COUNT(1) = 3

Resist that temptation. It's significantly slower. A similar scenario to this came up in SQL Statement - “Join” Vs “Group By and Having” and the second version was, in that second, about twenty times slower.

抵制这种诱惑。明显慢。类似的情形出现在SQL语句中——“Join”Vs“Group By and Having”,而第二个版本在第二个版本中慢了大约20倍。

I'd also suggest you look at Database Development Mistakes Made by AppDevelopers.

我还建议您看看应用程序开发人员在数据库开发中犯的错误。

#2


3  

I'd start from something like:

我可以这样开始:

SELECT EmpID, COUNT(*) AS NumDepts
FROM thetable
WHERE DeptID IN (1, 2, 3)
GROUP BY EmpId
HAVING COUNT(*) == 3

of course, that 3 in the last line would always be the length of the sequence of department ids you're checking (so for (2,3,4,5,6,7) it would be 6). This is one natural way to express "employees connected to all of these departments".

当然,最后一行中的3就是你要检查的部门id序列的长度(2、3、4、5、6、7)是6),这是表示“与所有这些部门连接的员工”的一种自然方式。

Edit: I see a note in another answer about performance issues -- I've tried this approach in SQLite and PostgreSQL, with appropriate indices, and there it looks like it's performing well and with appropriate use of all said indices; and in MySQL 5.0, where I have to admit performance was nowhere as good.

编辑:我看到了关于性能问题的另一个答案——我在SQLite和PostgreSQL中尝试了这个方法,并使用了适当的索引,看起来它运行良好,并且使用了所有的索引;而在MySQL 5.0中,我不得不承认性能没有那么好。

I suspect (without an opportunity to benchmark this on a zillion more engines;-) that other really good SQL engines (such as SQL Server 2008, Oracle, IBM DB2, the new open-source Ingres...) will also optimize this query well, while other mediocre ones (can't think of any with a popularity anywhere close to MySQL's) won't.

我怀疑(没有基准在无数的机会更多的引擎;-),其他很好的SQL引擎(如SQL Server 2008、Oracle、IBM DB2,新的开源安格尔…)也将这个查询优化,而其他平庸的人气(想不出任何接近MySQL)不会。

So, no doubt your favorite answer will depend on what engines you really care about (this takes me back to the time, over a decade ago, when my responsibilities included managing the team which maintained a component that was supposed to provide well-performing queries over more than half a dozen disparate engines -- talk about nightmare jobs...!-).

所以,毫无疑问,你最喜欢的答案将取决于发动机你真正关心的(这需要我回时间,在十年前,当我的职责包括管理团队保持一个组件应该提供良好的查询超过半打不同的引擎——谈论噩梦工作…!)。

#1


14  

I'm assuming you want to find employees that are in ALL of the specified departments and not just the employees that are in ANY of the departments, which is a far easier query.

我假设你想要找到所有指定部门的员工而不仅仅是任何部门的员工,这是一个简单得多的查询。

SELECT EmpID
FROM mytable t1
JOIN mytable t2 ON t1.EmpID = t2.EmpID AND t2.DeptID = 2
JOIN mytable t3 ON t2.EmpID = t3.EmpID AND t3.DeptID = 3
WHERE DeptID = 1

I'm going to preempt the inevitable suggestion that'll come to use aggregation:

我要先提出一个不可避免的建议来使用聚合:

SELECT EmpID
FROM mytable
WHERE DeptID IN (1,2,3)
GROUP BY EmpID
HAVING COUNT(1) = 3

Resist that temptation. It's significantly slower. A similar scenario to this came up in SQL Statement - “Join” Vs “Group By and Having” and the second version was, in that second, about twenty times slower.

抵制这种诱惑。明显慢。类似的情形出现在SQL语句中——“Join”Vs“Group By and Having”,而第二个版本在第二个版本中慢了大约20倍。

I'd also suggest you look at Database Development Mistakes Made by AppDevelopers.

我还建议您看看应用程序开发人员在数据库开发中犯的错误。

#2


3  

I'd start from something like:

我可以这样开始:

SELECT EmpID, COUNT(*) AS NumDepts
FROM thetable
WHERE DeptID IN (1, 2, 3)
GROUP BY EmpId
HAVING COUNT(*) == 3

of course, that 3 in the last line would always be the length of the sequence of department ids you're checking (so for (2,3,4,5,6,7) it would be 6). This is one natural way to express "employees connected to all of these departments".

当然,最后一行中的3就是你要检查的部门id序列的长度(2、3、4、5、6、7)是6),这是表示“与所有这些部门连接的员工”的一种自然方式。

Edit: I see a note in another answer about performance issues -- I've tried this approach in SQLite and PostgreSQL, with appropriate indices, and there it looks like it's performing well and with appropriate use of all said indices; and in MySQL 5.0, where I have to admit performance was nowhere as good.

编辑:我看到了关于性能问题的另一个答案——我在SQLite和PostgreSQL中尝试了这个方法,并使用了适当的索引,看起来它运行良好,并且使用了所有的索引;而在MySQL 5.0中,我不得不承认性能没有那么好。

I suspect (without an opportunity to benchmark this on a zillion more engines;-) that other really good SQL engines (such as SQL Server 2008, Oracle, IBM DB2, the new open-source Ingres...) will also optimize this query well, while other mediocre ones (can't think of any with a popularity anywhere close to MySQL's) won't.

我怀疑(没有基准在无数的机会更多的引擎;-),其他很好的SQL引擎(如SQL Server 2008、Oracle、IBM DB2,新的开源安格尔…)也将这个查询优化,而其他平庸的人气(想不出任何接近MySQL)不会。

So, no doubt your favorite answer will depend on what engines you really care about (this takes me back to the time, over a decade ago, when my responsibilities included managing the team which maintained a component that was supposed to provide well-performing queries over more than half a dozen disparate engines -- talk about nightmare jobs...!-).

所以,毫无疑问,你最喜欢的答案将取决于发动机你真正关心的(这需要我回时间,在十年前,当我的职责包括管理团队保持一个组件应该提供良好的查询超过半打不同的引擎——谈论噩梦工作…!)。