从同一表中的多个列中选择不同的值

时间:2022-07-08 23:49:38

I am trying to construct a single SQL statement that returns unique, non-null values from multiple columns all located in the same table.

我正在尝试构造一个SQL语句,该语句从位于同一表中的多个列返回惟一的非空值。

 SELECT distinct tbl_data.code_1 FROM tbl_data
      WHERE tbl_data.code_1 is not null
 UNION
 SELECT tbl_data.code_2 FROM tbl_data
      WHERE tbl_data.code_2 is not null;

For example, tbl_data is as follows:

例如,tbl_data如下:

 id   code_1    code_2
 ---  --------  ----------
 1    AB        BC
 2    BC        
 3    DE        EF
 4              BC

For the above table, the SQL query should return all unique non-null values from the two columns, namely: AB, BC, DE, EF.

对于上面的表,SQL查询应该返回两列中所有唯一的非空值,即AB、BC、DE、EF。

I'm fairly new to SQL. My statement above works, but is there a cleaner way to write this SQL statement, since the columns are from the same table?

我对SQL很陌生。我上面的语句是有效的,但是是否有一种更干净的方法来编写这个SQL语句,因为这些列来自同一个表?

2 个解决方案

#1


18  

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed:

最好在您的问题中包含代码,而不是含糊不清的文本数据,以便我们都使用相同的数据。下面是我假设的样本模式和数据:

CREATE TABLE tbl_data (
  id INT NOT NULL,
  code_1 CHAR(2),
  code_2 CHAR(2)
);

INSERT INTO tbl_data (
  id,
  code_1,
  code_2
)
VALUES
  (1, 'AB', 'BC'),
  (2, 'BC', NULL),
  (3, 'DE', 'EF'),
  (4, NULL, 'BC');

As Blorgbeard commented, the DISTINCT clause in your solution is unnecessary because the UNION operator eliminates duplicate rows. There is a UNION ALL operator that does not elimiate duplicates, but it is not appropriate here.

正如Blorgbeard所言,解决方案中不同的子句是不必要的,因为UNION操作符消除了重复的行。有一个联合所有操作符不排除重复,但在这里不合适。

Rewriting your query without the DISTINCT clause is a fine solution to this problem:

在没有明确条款的情况下重写查询是解决这个问题的一个很好的方法:

SELECT code_1
FROM tbl_data
WHERE code_1 IS NOT NULL
UNION
SELECT code_2
FROM tbl_data
WHERE code_2 IS NOT NULL;

It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables.

这两列在同一个表中并不重要。即使列在不同的表中,解决方案也是一样的。

If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that:

如果您不喜欢重复指定相同的filter子句,那么您可以在过滤之前将union查询封装在一个虚拟表中:

SELECT code
FROM (
  SELECT code_1
  FROM tbl_data
  UNION
  SELECT code_2
  FROM tbl_data
) AS DistinctCodes (code)
WHERE code IS NOT NULL;

I find the syntax of the second more ugly, but it is logically neater. But which one performs better?

我发现第二种语法更难看,但逻辑上更清晰。但是哪一个表现更好呢?

I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries:

我创建了一个sqlfiddle,它演示了SQL Server 2005的查询优化器为两个不同的查询生成相同的执行计划:

从同一表中的多个列中选择不同的值

If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent.

如果SQL Server为两个查询生成相同的执行计划,那么它们实际上和逻辑上是等价的。

Compare the above to the execution plan for the query in your question:

将上面的内容与您问题中查询的执行计划进行比较:

从同一表中的多个列中选择不同的值

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the DISTINCT in the first query would be filtered out by the UNION later anyway.

这个独特的子句使SQL Server 2005执行了一个冗余的排序操作,因为查询优化器不知道,在第一个查询中被不同过滤掉的任何重复内容都将在以后被UNION过滤掉。

This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

此查询在逻辑上与其他两个查询相同,但冗余操作使其效率降低。在大型数据集中,我希望查询返回结果集的时间比这里的两个要长。不要相信我的话;一定要在自己的环境中进行实验!

#2


4  

try something like SubQuery:

尝试像子查询:

SELECT derivedtable.NewColumn
FROM
(
    SELECT code_1 as NewColumn FROM tbl_data 
    UNION
    SELECT code_2 as NewColumn FROM tbl_data 
) derivedtable
WHERE derivedtable.NewColumn IS NOT NULL

The UNION already returns DISTINCT values from the combined query.

联合已经从组合查询返回不同的值。

#1


18  

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed:

最好在您的问题中包含代码,而不是含糊不清的文本数据,以便我们都使用相同的数据。下面是我假设的样本模式和数据:

CREATE TABLE tbl_data (
  id INT NOT NULL,
  code_1 CHAR(2),
  code_2 CHAR(2)
);

INSERT INTO tbl_data (
  id,
  code_1,
  code_2
)
VALUES
  (1, 'AB', 'BC'),
  (2, 'BC', NULL),
  (3, 'DE', 'EF'),
  (4, NULL, 'BC');

As Blorgbeard commented, the DISTINCT clause in your solution is unnecessary because the UNION operator eliminates duplicate rows. There is a UNION ALL operator that does not elimiate duplicates, but it is not appropriate here.

正如Blorgbeard所言,解决方案中不同的子句是不必要的,因为UNION操作符消除了重复的行。有一个联合所有操作符不排除重复,但在这里不合适。

Rewriting your query without the DISTINCT clause is a fine solution to this problem:

在没有明确条款的情况下重写查询是解决这个问题的一个很好的方法:

SELECT code_1
FROM tbl_data
WHERE code_1 IS NOT NULL
UNION
SELECT code_2
FROM tbl_data
WHERE code_2 IS NOT NULL;

It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables.

这两列在同一个表中并不重要。即使列在不同的表中,解决方案也是一样的。

If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that:

如果您不喜欢重复指定相同的filter子句,那么您可以在过滤之前将union查询封装在一个虚拟表中:

SELECT code
FROM (
  SELECT code_1
  FROM tbl_data
  UNION
  SELECT code_2
  FROM tbl_data
) AS DistinctCodes (code)
WHERE code IS NOT NULL;

I find the syntax of the second more ugly, but it is logically neater. But which one performs better?

我发现第二种语法更难看,但逻辑上更清晰。但是哪一个表现更好呢?

I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries:

我创建了一个sqlfiddle,它演示了SQL Server 2005的查询优化器为两个不同的查询生成相同的执行计划:

从同一表中的多个列中选择不同的值

If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent.

如果SQL Server为两个查询生成相同的执行计划,那么它们实际上和逻辑上是等价的。

Compare the above to the execution plan for the query in your question:

将上面的内容与您问题中查询的执行计划进行比较:

从同一表中的多个列中选择不同的值

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the DISTINCT in the first query would be filtered out by the UNION later anyway.

这个独特的子句使SQL Server 2005执行了一个冗余的排序操作,因为查询优化器不知道,在第一个查询中被不同过滤掉的任何重复内容都将在以后被UNION过滤掉。

This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

此查询在逻辑上与其他两个查询相同,但冗余操作使其效率降低。在大型数据集中,我希望查询返回结果集的时间比这里的两个要长。不要相信我的话;一定要在自己的环境中进行实验!

#2


4  

try something like SubQuery:

尝试像子查询:

SELECT derivedtable.NewColumn
FROM
(
    SELECT code_1 as NewColumn FROM tbl_data 
    UNION
    SELECT code_2 as NewColumn FROM tbl_data 
) derivedtable
WHERE derivedtable.NewColumn IS NOT NULL

The UNION already returns DISTINCT values from the combined query.

联合已经从组合查询返回不同的值。