T-SQL对不同行中MAX长度列的行进行分组(?)

时间:2022-12-04 07:37:22

i'm trying to come up with a way to combine rows in a table based on the longest string in any of the rows based on a row key

我试图想出一种基于行键基于任何行中最长字符串组合表中行的方法

example

CREATE TABLE test1 
    (akey int not null , 
    text1 varchar(50) NULL, 
    text2 varchar(50) NULL, 
    text3 varchar(50) NULL  )


INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley',NULL)
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley estate','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869A')
INSERT INTO test1 VALUES ( 2,'','birmingham','P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North East','P53342')
INSERT INTO test1 VALUES ( 2,'Smith Cl.',NULL,'P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North','P53342')

with these rows i would be looking for the result of :

有了这些行,我会寻找结果:

1   Winchester Road,    crawley estate, P21869A
2   Smith Close,    birmingham North East,  P53342B

EDIT: the results above need to be in a table rather than just a comma separated string

编辑:上面的结果需要在一个表而不只是一个逗号分隔的字符串

as you can see in the result, the output should be the longest text column in the range of the 'akey' field.

正如您在结果中看到的那样,输出应该是'akey'字段范围内最长的文本列。

i'm trying to come up with a solution that does not involve lots of subqueries on each column, the actual table has 32 columns and over 13 million rows.

我正在尝试提出一个解决方案,每个列上不涉及大量子查询,实际表有32列和超过1300万行。

the reason i'm doing this is to create a cleaned-up table that has the best results in each column for just one ID per row

我这样做的原因是创建一个清理的表,每列只有一个ID,每个列的结果最好

this is my first post, so let me know if you need any more info, and i'm happy to hear about any best practices about posting that i've broken!

这是我的第一篇文章,所以如果您需要更多信息,请告诉我,我很高兴听到有关发布我已经破坏的最佳做法!

thanks

Ben.

2 个解决方案

#1


SELECT A.akey, 
    (
        SELECT TOP 1 T1.text1
        FROM test1 T1
        WHERE T1.akey=A.akey AND LEN(T1.TEXT1) = MAX(LEN(A.text1))
    ) AS TEXT1,
    (
        SELECT TOP 1 T2.text2
        FROM test1 T2
        WHERE T2.akey=A.akey AND LEN(T2.TEXT2) = MAX(LEN(A.text2))
    ) AS TEXT2,
    (
        SELECT TOP 1 T3.text3
        FROM test1 T3
        WHERE T3.akey=A.akey AND LEN(T3.TEXT3) = MAX(LEN(A.text3))
    ) AS TEXT3
FROM TEST1 AS A
GROUP BY A.akey

I just realized you said you have 32 columns. I don't see a good way to do that, unless UNPIVOT would allow you to create separate rows (akey, textn) for each text* column.

我刚刚意识到你说你有32列。除非UNPIVOT允许您为每个文本*列创建单独的行(akey,textn),否则我看不到这样做的好方法。

Edit: I may not have a chance to finish this today, but UNPIVOT looks useful:

编辑:今天我可能没有机会完成这个,但UNPIVOT看起来很有用:

;
WITH COLUMNS AS
(
    SELECT akey, [Column], ColumnValue
    FROM
        (
            SELECT X.Akey, X.Text1, X.Text2, X.Text3
            FROM test1 X
        ) AS p
    UNPIVOT (ColumnValue FOR [Column] IN (Text1, Text2, Text3))
    AS UNPVT
)
SELECT *
FROM COLUMNS
ORDER BY akey,[Column], LEN(ColumnValue)

#2


This seems really ugly, but at least works (on SQL2K) and doesn't need subqueries:

这看起来真的很难看,但至少有效(在SQL2K上)并且不需要子查询:

select test1.akey, A.text1, B.text2, C.text3
from test1
inner join test1 A on A.akey = test1.akey 
inner join test1 B on B.akey = test1.akey 
inner join test1 C on C.akey = test1.akey 
group by test1.akey, A.text1, B.text2, C.text3
having len(a.text1) = max(len(test1.text1))
   and len(B.text2) = max(len(test1.text2))
   and len(C.text3) = max(len(test1.text3))
order by test1.akey

I must admit that it needs an inner join for each column and I wonder how this could impact on the 32 columns x 13millions record table... I try both this approach and the one based one subqueries and looked at executions plans: I'ld actually be curious to know

我必须承认它需要每列的内连接,我想知道这对32列x 13百万记录表有什么影响......我尝试这种方法和基于一个子查询的方法,看看执行计划:我是实际上很想知道

#1


SELECT A.akey, 
    (
        SELECT TOP 1 T1.text1
        FROM test1 T1
        WHERE T1.akey=A.akey AND LEN(T1.TEXT1) = MAX(LEN(A.text1))
    ) AS TEXT1,
    (
        SELECT TOP 1 T2.text2
        FROM test1 T2
        WHERE T2.akey=A.akey AND LEN(T2.TEXT2) = MAX(LEN(A.text2))
    ) AS TEXT2,
    (
        SELECT TOP 1 T3.text3
        FROM test1 T3
        WHERE T3.akey=A.akey AND LEN(T3.TEXT3) = MAX(LEN(A.text3))
    ) AS TEXT3
FROM TEST1 AS A
GROUP BY A.akey

I just realized you said you have 32 columns. I don't see a good way to do that, unless UNPIVOT would allow you to create separate rows (akey, textn) for each text* column.

我刚刚意识到你说你有32列。除非UNPIVOT允许您为每个文本*列创建单独的行(akey,textn),否则我看不到这样做的好方法。

Edit: I may not have a chance to finish this today, but UNPIVOT looks useful:

编辑:今天我可能没有机会完成这个,但UNPIVOT看起来很有用:

;
WITH COLUMNS AS
(
    SELECT akey, [Column], ColumnValue
    FROM
        (
            SELECT X.Akey, X.Text1, X.Text2, X.Text3
            FROM test1 X
        ) AS p
    UNPIVOT (ColumnValue FOR [Column] IN (Text1, Text2, Text3))
    AS UNPVT
)
SELECT *
FROM COLUMNS
ORDER BY akey,[Column], LEN(ColumnValue)

#2


This seems really ugly, but at least works (on SQL2K) and doesn't need subqueries:

这看起来真的很难看,但至少有效(在SQL2K上)并且不需要子查询:

select test1.akey, A.text1, B.text2, C.text3
from test1
inner join test1 A on A.akey = test1.akey 
inner join test1 B on B.akey = test1.akey 
inner join test1 C on C.akey = test1.akey 
group by test1.akey, A.text1, B.text2, C.text3
having len(a.text1) = max(len(test1.text1))
   and len(B.text2) = max(len(test1.text2))
   and len(C.text3) = max(len(test1.text3))
order by test1.akey

I must admit that it needs an inner join for each column and I wonder how this could impact on the 32 columns x 13millions record table... I try both this approach and the one based one subqueries and looked at executions plans: I'ld actually be curious to know

我必须承认它需要每列的内连接,我想知道这对32列x 13百万记录表有什么影响......我尝试这种方法和基于一个子查询的方法,看看执行计划:我是实际上很想知道