在列中查找具有重复值的行

时间:2022-12-09 22:59:00

I have a table author_data:

我有一个表author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

Now I need the result as:

现在我需要结果如下:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

That is, I need the author_id for the names having duplicate appearances. I have tried this statement:

也就是说,我需要author_id来获取具有重复外观的名称。我试过这句话:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

But it's not working. How can I get this?

但它不起作用。我怎么能得到这个?

3 个解决方案

#1


9  

I suggest a window function in a subquery:

我建议子查询中的窗口函数:

SELECT author_id, author_name  -- omit the name here, if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

You will recognize the basic aggregate function count(). It can be turned into a window function by appending an OVER clause - just like any other aggregate function.

您将识别基本的聚合函数count()。可以通过附加OVER子句将其转换为窗口函数 - 就像任何其他聚合函数一样。

This way it counts the rows per partition. Voilá.

这样它计算每个分区的行数。瞧。

In older versions without window functions (v.8.3 or older) - or generally - this alternative performs pretty fast:

在没有窗口功能(v.8.3或更早版本)的旧版本中 - 或者通常 - 此替代方案执行速度非常快:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT 1
   FROM   author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

If you are concerned with performance, add an index on author_name.

如果您关心性能,请在author_name上添加索引。

#2


1  

You are half way there already. You need to just use the identified Author_IDs and fetch the rest of the data.

你已经到了一半了。您只需使用标识的Author_ID并获取其余数据。

try this..

尝试这个..

SELECT author_id, author_name
FROM author_data
WHERE author_id in (select author_id
        from author_data
        group by author_name
        having count(author_name)>1)

#3


1  

You could join the table onto itself, which is achievable with either of the following queries:

您可以将表连接到自身,这可以通过以下任一查询实现:

SELECT a1.author_id, a1.author_name
FROM authors a1
CROSS JOIN authors a2
  ON a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

--OR

SELECT a1.author_id, a1.author_name
FROM authors a1
INNER JOIN authors a2
  WHERE a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

#1


9  

I suggest a window function in a subquery:

我建议子查询中的窗口函数:

SELECT author_id, author_name  -- omit the name here, if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

You will recognize the basic aggregate function count(). It can be turned into a window function by appending an OVER clause - just like any other aggregate function.

您将识别基本的聚合函数count()。可以通过附加OVER子句将其转换为窗口函数 - 就像任何其他聚合函数一样。

This way it counts the rows per partition. Voilá.

这样它计算每个分区的行数。瞧。

In older versions without window functions (v.8.3 or older) - or generally - this alternative performs pretty fast:

在没有窗口功能(v.8.3或更早版本)的旧版本中 - 或者通常 - 此替代方案执行速度非常快:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT 1
   FROM   author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

If you are concerned with performance, add an index on author_name.

如果您关心性能,请在author_name上添加索引。

#2


1  

You are half way there already. You need to just use the identified Author_IDs and fetch the rest of the data.

你已经到了一半了。您只需使用标识的Author_ID并获取其余数据。

try this..

尝试这个..

SELECT author_id, author_name
FROM author_data
WHERE author_id in (select author_id
        from author_data
        group by author_name
        having count(author_name)>1)

#3


1  

You could join the table onto itself, which is achievable with either of the following queries:

您可以将表连接到自身,这可以通过以下任一查询实现:

SELECT a1.author_id, a1.author_name
FROM authors a1
CROSS JOIN authors a2
  ON a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

--OR

SELECT a1.author_id, a1.author_name
FROM authors a1
INNER JOIN authors a2
  WHERE a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe