如何从具有重复项的SQL的表中的每个组中选择前3个值[重复]

时间:2022-01-20 01:21:00

This question already has an answer here:

这个问题在这里已有答案:

Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.

假设我们有一个包含两列的表,一列包含一些人的名字,另一列包含与每个人相关的一些值。一个人可以拥有多个价值。每个值都有一个数字类型。问题是我们想从表格中为每个人选择前3个值。如果一个人的值少于3,我们会选择该人的所有值。

The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?

如果本文中提供的查询在表中没有重复项,则可以解决此问题。使用SQL从表中的每个组中选择前3个值。但如果有重复,那么解决方案是什么?

For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:

例如,如果对于一个名字John,他有5个与他相关的值。它们是20,7,7,7,4。我需要按以下顺序返回名称/值对每个名称的降序值:

-----------+-------+
| name     | value |
-----------+-------+
| John     |    20 |
| John     |     7 |
| John     |     7 |
-----------+-------+

Only 3 rows should be returned for John even though there are three 7s for John.

John应该只返回3行,即使John有3个7。

6 个解决方案

#1


26  

In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER() which is part of the latest SQL standard:

在许多现代DBMS(例如Postgres,Oracle,SQL-Server,DB2和许多其他)中,以下内容都可以正常工作。它使用CTE和排名函数ROW_NUMBER(),它是最新SQL标准的一部分:

 WITH cte AS
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  )
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;

Without CTE, only ROW_NUMBER():

没有CTE,只有ROW_NUMBER():

SELECT name, value, rn
FROM 
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  ) tmp 
WHERE rn <= 3
ORDER BY name, rn ; 

Tested in:

测试中:


In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.

在MySQL和其他没有排名功能的DBMS中,必须使用派生表,相关子查询或自联接与GROUP BY。

The (tid) is assumed to be the primary key of the table:

假设(tid)是表的主键:

SELECT t.tid, t.name, t.value,              -- self join and GROUP BY
       COUNT(*) AS rn
FROM t
  JOIN t AS t2
    ON  t2.name = t.name
    AND ( t2.value > t.value
        OR  t2.value = t.value
        AND t2.tid <= t.tid
        )
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;


SELECT t.tid, t.name, t.value, rn
FROM
  ( SELECT t.tid, t.name, t.value,
           ( SELECT COUNT(*)                -- inline, correlated subquery
             FROM t AS t2
             WHERE t2.name = t.name
              AND ( t2.value > t.value
                 OR  t2.value = t.value
                 AND t2.tid <= t.tid
                  )
           ) AS rn
    FROM t
  ) AS t
WHERE rn <= 3
ORDER BY name, rn ;

Tested in MySQL

在MySQL中测试过

#2


0  

I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.

我打算回答这个问题。但是,我意识到它可能真的要求跨数据库解决方案。

Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:

假设您正在寻找一种独立于数据库的方法来实现这一点,我能想到的唯一方法是使用相关子查询(或非等值连接)。这是一个例子:

select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3)

However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.

但是,您提到的每个数据库(我注意到,Hadoop不是数据库)有更好的方法。不幸的是,它们都不是标准的SQL。

Here is an example of it working in SQL Server:

以下是在SQL Server中工作的示例:

with t as (
      select 1 as personid, 5 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 7 as val union all
      select 1 as personid, 8 as val
     )
select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3);

#3


0  

Using GROUP_CONCAT and FIND_IN_SET you can do that.Check SQLFIDDLE.

使用GROUP_CONCAT和FIND_IN_SET可以做到这一点。检查SQLFIDDLE。

SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
                             SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
                           FROM tbl t1
                           WHERE t1.name = t.name
                           GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc

#4


0  

If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.

如果结果集不那么重,可以为该问题编写存储过程(或匿名PL / SQL块),迭代结果集并通过简单的比较算法找到三个。

#5


0  

Try this -

尝试这个 -

CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);

WITH cte
AS (
SELECT NAME
    ,value
    ,ROW_NUMBER() OVER (
        PARTITION BY NAME ORDER BY (value) DESC
        ) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC

#6


0  

This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)

这适用于MS SQL。应该可以在任何其他SQL方言中使用,该方言能够在group by或over子句中分配行号(或equivelant)

if object_id('tempdb..#Data') is not null drop table #Data;
GO

create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Jack', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO

with D as (
SELECT
     name
    ,Value
    ,row_number() over (partition by name order by value desc) rn
From
    #Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc

Name    Value
Jack    5
Jane    30
Jane    21
Jane    18
John    20
John    7
John    7

#1


26  

In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER() which is part of the latest SQL standard:

在许多现代DBMS(例如Postgres,Oracle,SQL-Server,DB2和许多其他)中,以下内容都可以正常工作。它使用CTE和排名函数ROW_NUMBER(),它是最新SQL标准的一部分:

 WITH cte AS
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  )
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;

Without CTE, only ROW_NUMBER():

没有CTE,只有ROW_NUMBER():

SELECT name, value, rn
FROM 
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  ) tmp 
WHERE rn <= 3
ORDER BY name, rn ; 

Tested in:

测试中:


In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.

在MySQL和其他没有排名功能的DBMS中,必须使用派生表,相关子查询或自联接与GROUP BY。

The (tid) is assumed to be the primary key of the table:

假设(tid)是表的主键:

SELECT t.tid, t.name, t.value,              -- self join and GROUP BY
       COUNT(*) AS rn
FROM t
  JOIN t AS t2
    ON  t2.name = t.name
    AND ( t2.value > t.value
        OR  t2.value = t.value
        AND t2.tid <= t.tid
        )
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;


SELECT t.tid, t.name, t.value, rn
FROM
  ( SELECT t.tid, t.name, t.value,
           ( SELECT COUNT(*)                -- inline, correlated subquery
             FROM t AS t2
             WHERE t2.name = t.name
              AND ( t2.value > t.value
                 OR  t2.value = t.value
                 AND t2.tid <= t.tid
                  )
           ) AS rn
    FROM t
  ) AS t
WHERE rn <= 3
ORDER BY name, rn ;

Tested in MySQL

在MySQL中测试过

#2


0  

I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.

我打算回答这个问题。但是,我意识到它可能真的要求跨数据库解决方案。

Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:

假设您正在寻找一种独立于数据库的方法来实现这一点,我能想到的唯一方法是使用相关子查询(或非等值连接)。这是一个例子:

select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3)

However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.

但是,您提到的每个数据库(我注意到,Hadoop不是数据库)有更好的方法。不幸的是,它们都不是标准的SQL。

Here is an example of it working in SQL Server:

以下是在SQL Server中工作的示例:

with t as (
      select 1 as personid, 5 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 7 as val union all
      select 1 as personid, 8 as val
     )
select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3);

#3


0  

Using GROUP_CONCAT and FIND_IN_SET you can do that.Check SQLFIDDLE.

使用GROUP_CONCAT和FIND_IN_SET可以做到这一点。检查SQLFIDDLE。

SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
                             SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
                           FROM tbl t1
                           WHERE t1.name = t.name
                           GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc

#4


0  

If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.

如果结果集不那么重,可以为该问题编写存储过程(或匿名PL / SQL块),迭代结果集并通过简单的比较算法找到三个。

#5


0  

Try this -

尝试这个 -

CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);

WITH cte
AS (
SELECT NAME
    ,value
    ,ROW_NUMBER() OVER (
        PARTITION BY NAME ORDER BY (value) DESC
        ) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC

#6


0  

This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)

这适用于MS SQL。应该可以在任何其他SQL方言中使用,该方言能够在group by或over子句中分配行号(或equivelant)

if object_id('tempdb..#Data') is not null drop table #Data;
GO

create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Jack', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO

with D as (
SELECT
     name
    ,Value
    ,row_number() over (partition by name order by value desc) rn
From
    #Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc

Name    Value
Jack    5
Jane    30
Jane    21
Jane    18
John    20
John    7
John    7