T-SQL |字符串“操纵”和聚合

时间:2023-01-23 11:42:29

I have the following scenario.

我有以下情况。

SOURCE TABLE 1

消息来源表1

CREATE TABLE #Table1 
(
     Div varchar(10), 
     Dept varchar(10), 
     States varchar(10)
)

INSERT INTO #Table1
   SELECT 'Div1','Dept1','CA,NV,TX'
   UNION ALL
   SELECT 'Div2','Dept2','MI,OH,IN'
   UNION ALL
   SELECT 'Div3','Dept2','NY,NJ,PA'
   UNION ALL
   SELECT 'Div4','Dept1',NULL

SOURCE TABLE 2

消息来源表2

CREATE TABLE #Table2 
(
    Div varchar(10), 
    Dept varchar(10), 
    States varchar(10)
)

INSERT INTO #Table2
   SELECT 'Div1','Dept1','CA'
   UNION ALL
   SELECT 'Div1','Dept1','NV, TX'
   UNION ALL
   SELECT 'Div1','Dept1','TX, CA'
   UNION ALL
   SELECT 'Div1','Dept1','CA, NV'
   UNION ALL
   SELECT 'Div2','Dept2','MI, OH'
   UNION ALL
   SELECT 'Div2','Dept2','MI, IN'
   UNION ALL
   SELECT 'Div2','Dept2','OH'
   UNION ALL
   SELECT 'Div3','Dept2','NY, NJ, PA'

DESIRED OUTPUT

CREATE TABLE #Table3 
(
    Div varchar(10), 
    Dept varchar(10), 
    States varchar(50)
)

INSERT INTO #Table3
SELECT 'Div1','Dept1','CA - (3), NV - (2), TX - (2)'
UNION ALL
SELECT 'Div2','Dept2','MI - (2), OH - (2), IN - (1)'
UNION ALL
SELECT 'Div3','Dept2','NY - (1), NJ - (1), PA - (1)'
UNION ALL
SELECT 'Div4','Dept1',NULL

SELECT * FROM #Table1
SELECT * FROM #Table2
SELECT * FROM #Table3

DROP TABLE #Table1
DROP TABLE #Table2
DROP TABLE #Table3

SQLFIDDLE

Goal: Based on #Table1 and #Table2, join both table on Div and Dept fields and then aggregate the counts for distinct states in States field and create an output where you have Div, Dept, and States with unique count of each one of those states printed next to the state.

目标:基于#Table1和#Table2,在Div和Dept字段上加入两个表,然后在States字段中汇总不同状态的计数,并创建一个输出,其中包含Div,Dept和States,每个状态具有唯一计数印在国家旁边的州。

I am not sure how to achieve this. I am trying LIKE but can't quite figure out how to make it dynamic. I will continue trying to see if I can figure out. Thought I would post this question here and see if I can get some assist.

我不知道如何实现这一目标。我正在尝试但却无法弄清楚如何让它变得动态。我会继续试着看看能不能搞清楚。以为我会在这里发布这个问题,看看能否得到一些帮助。

Thank you

UPDATE:

Desired Output

Div     Dept    States
Div1    Dept1   CA - (3), NV - (2), TX - (2)
Div2    Dept2   MI - (2), OH - (2), IN - (1)
Div3    Dept2   NY - (1), NJ - (1), PA - (1)
Div4    Dept1   NULL

3 个解决方案

#1


6  

Your requirements are very nasty but as developers we have to work with what we've got. Here's a solution using Common Table Expression (CTE) extensively:

您的要求非常恶劣,但作为开发人员,我们必须使用我们所拥有的。这是一个广泛使用公用表表达式(CTE)的解决方案:

;WITH
    CTE1 AS
    (
        SELECT      Div, Dept,
                    REPLACE(States,' ','') + ',' AS States
        FROM        Table2
    ),
    CTE2 AS
    (
        SELECT      c1.Div, c1.Dept,
                    LEFT(c1.States,CHARINDEX(',', c1.States)-1)                 AS IndividualState,
                    RIGHT(c1.States,LEN(c1.States)-CHARINDEX(',', c1.States))   AS RemainingStates
        FROM        CTE1    c1
        UNION ALL
        SELECT      c2.Div, c2.Dept,
                    LEFT(c2.RemainingStates,CHARINDEX(',', c2.RemainingStates)-1),
                    RIGHT(c2.RemainingStates,LEN(c2.RemainingStates) - CHARINDEX(',', c2.RemainingStates))
        FROM        CTE2    c2
        WHERE       LEN(c2.RemainingStates) > 0
    ),
    CTE3 AS
    (
        SELECT      Div, Dept,
                    IndividualState,
                    COUNT(*)            AS StateCount
        FROM        CTE2
        GROUP BY    Div, Dept, IndividualState
    ),
    CTE4 AS
    (
        SELECT      t1.Div, t1.Dept,
                    (
                        SELECT  c3.IndividualState + ' - (' + CONVERT(varchar(10),c3.StateCount) + '), ' 
                        FROM    CTE3 c3
                        WHERE   c3.Div = t1.Div AND c3.Dept = t1.Dept
                        FOR XML PATH('')
                    )       AS States
        FROM        Table1  t1
    )

SELECT  Div, Dept,
        LEFT(States, LEN(States) - 1) AS States
FROM    CTE4

Explanation

  1. CTE1 cleans up the data in Table2: remove spaces, add a comma to the end
  2. CTE1清理表2中的数据:删除空格,在末尾添加逗号

  3. CTE2 does the normalization
  4. CTE2进行标准化

  5. CTE3 does the counting
  6. CTE3进行计数

  7. CTE4 does the final assembly, putting CA | 3 into CA - (3), ...
  8. CTE4进行最终组装,放入CA | 3进CA - (3),...

The last SELECT remove the trailing comma for neater output.

最后一个SELECT删除neater输出的尾随逗号。

To better understand each step, you can replace the final SELECT with SELECT * FROM CTE1, SELECT * FROM CTE2, etc.

为了更好地理解每个步骤,可以用SELECT * FROM CTE1,SELECT * FROM CTE2等替换最终的SELECT。

#2


6  

Ok, so first of all, you'll need to split the concatenated values in #Temp1 and #Temp2. There are various methods for doing so, I'll use the numbers table one that is described in this awesome blog post from Aaron Bertrand. So, we'll need a numbers table, which can be done this way:

好的,首先,您需要在#Temp1和#Temp2中拆分连接值。这样做有多种方法,我将使用Aaron Bertrand这篇精彩博客文章中描述的数字表。所以,我们需要一个数字表,可以这样做:

;WITH n AS
(
    SELECT  x = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
    FROM sys.all_objects AS s1
    CROSS JOIN sys.all_objects AS s2
)
SELECT Number = x
INTO #Numbers
FROM n
WHERE x BETWEEN 1 AND 8000;

Then, you'll need to actually do the splitting and then a group concatenation method for your result:

然后,您需要实际执行拆分,然后为您的结果执行组连接方法:

;WITH T1 AS
(
    SELECT *
    FROM #Table1 T
    OUTER APPLY (SELECT Item = SUBSTRING(T.States, Number,
                                         CHARINDEX(',',T.States + ',', Number) - 
                                         Number)
                 FROM #Numbers
                 WHERE Number <= CONVERT(INT, LEN(T.States))
                 AND SUBSTRING(',' + T.States, Number, LEN(',')) = ',') N
), T2 AS
(
    SELECT *
    FROM #Table2 T
    OUTER APPLY (SELECT Item = SUBSTRING(T.States, Number,
                                         CHARINDEX(', ',T.States + ', ', Number) - 
                                         Number)
                 FROM #Numbers
                 WHERE Number <= CONVERT(INT, LEN(T.States))
                 AND SUBSTRING(', ' + T.States, Number, LEN(', ')) = ', ') N
), T3 AS
(
    SELECT T1.Div, T1.Dept, T1.Item, COUNT(*) N
    FROM T1 
    LEFT JOIN T2
        ON T1.Div = T2.Div
        AND T1.Dept = T2.Dept
        AND T1.Item = T2.Item
    GROUP BY T1.Div, T1.Dept, T1.Item
)
SELECT  A.Div, 
        A.Dept, 
        States = STUFF((SELECT  ',' + CONVERT(VARCHAR(20), Item) + 
                                ' - (' + CAST(N AS VARCHAR(4)) + ')'
                        FROM T3 
                        WHERE Div = A.Div
                        AND Dept = A.Dept
                    FOR XML PATH(''), TYPE).value('.[1]','nvarchar(max)'),1,1,'')
FROM T3 A
ORDER BY Div, Dept, Item

The results are:

结果是:

╔══════╦═══════╦════════════════════════════╗
║ Div  ║ Dept  ║           States           ║
╠══════╬═══════╬════════════════════════════╣
║ Div1 ║ Dept1 ║ CA - (3),NV - (2),TX - (2) ║
║ Div1 ║ Dept1 ║ CA - (3),NV - (2),TX - (2) ║
║ Div1 ║ Dept1 ║ CA - (3),NV - (2),TX - (2) ║
║ Div2 ║ Dept2 ║ IN - (1),MI - (2),OH - (2) ║
║ Div2 ║ Dept2 ║ IN - (1),MI - (2),OH - (2) ║
║ Div2 ║ Dept2 ║ IN - (1),MI - (2),OH - (2) ║
║ Div3 ║ Dept2 ║ NJ - (1),NY - (1),PA - (1) ║
║ Div3 ║ Dept2 ║ NJ - (1),NY - (1),PA - (1) ║
║ Div3 ║ Dept2 ║ NJ - (1),NY - (1),PA - (1) ║
║ Div4 ║ Dept1 ║ NULL                       ║
╚══════╩═══════╩════════════════════════════╝

#3


2  

Ideally of course this data would be normalized, because this is messy. Given that you're stuck with this structure, I think your best bet is to use a driver table of states or to use one of the many split functions available to split out the delimited values, and then use that, to get each state on it's own line:

理想情况下,这些数据当然会被标准化,因为这很麻烦。鉴于您坚持使用此结构,我认为您最好的选择是使用状态驱动程序表或使用可用的许多拆分函数之一来拆分分隔值,然后使用它来获取每个状态这是自己的路线:

;WITH cte AS (SELECT DISTINCT b.Div,b.Dept,a.abbrs 
                FROM #States a
                RIGHT JOIN #Table1 b
                 ON ','+REPLACE(b.States,' ','')+',' LIKE '%,'+a.abbrs+',%'
              )
     ,cte2 AS (SELECT b.Div,b.Dept,a.abbrs 
                FROM #States a
                JOIN #Table2 b
                 ON ','+REPLACE(b.States,' ','')+',' LIKE '%,'+a.abbrs+',%'
              )
     ,cte3 AS (SELECT a.Div,a.Dept,a.abbrs,CAST(COUNT(b.abbrs)AS VARCHAR(25)) CT 
                FROM  cte a
                LEFT JOIN cte2 b
                 ON a.Dept = b.Dept
                 AND a.Div = b.Div
                 AND a.abbrs = b.abbrs
                GROUP BY a.div,a.dept,a.abbrs
              )
SELECT DISTINCT
      Div,Dept
      ,STUFF((SELECT DISTINCT ',' +   abbrs+'-('+CT+')'
                                    FROM cte3 b                         
                                    WHERE a.Div = b.Div
                                      AND a.Dept = b.Dept
                                    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)') 
                                    ,1,1,'')

FROM  cte3 a

Demo: SQL Fiddle

演示:SQL小提琴

Note:

  • cte1 - creates distinct list of div/dept/state from table1
  • cte1 - 从table1创建不同的div / dept / state列表

  • cte2 - creates list of all div/dept/states from table2
  • cte2 - 从table2创建所有div / dept / states的列表

  • cte3 - aggregates on div/dept/state to get the count
  • cte3 - 在div / dept / state上聚合以获取计数

Output is:

|  DIV |  DEPT |               STATES |
|------|-------|----------------------|
| Div1 | Dept1 | CA-(3),NV-(2),TX-(2) |
| Div2 | Dept2 | IN-(1),MI-(2),OH-(2) |
| Div3 | Dept2 | NJ-(1),NY-(1),PA-(1) |
| Div4 | Dept1 |               (null) |

Updated fiddle to include your NULL line and added output.

更新了小提琴以包含您的NULL行和添加的输出。

#1


6  

Your requirements are very nasty but as developers we have to work with what we've got. Here's a solution using Common Table Expression (CTE) extensively:

您的要求非常恶劣,但作为开发人员,我们必须使用我们所拥有的。这是一个广泛使用公用表表达式(CTE)的解决方案:

;WITH
    CTE1 AS
    (
        SELECT      Div, Dept,
                    REPLACE(States,' ','') + ',' AS States
        FROM        Table2
    ),
    CTE2 AS
    (
        SELECT      c1.Div, c1.Dept,
                    LEFT(c1.States,CHARINDEX(',', c1.States)-1)                 AS IndividualState,
                    RIGHT(c1.States,LEN(c1.States)-CHARINDEX(',', c1.States))   AS RemainingStates
        FROM        CTE1    c1
        UNION ALL
        SELECT      c2.Div, c2.Dept,
                    LEFT(c2.RemainingStates,CHARINDEX(',', c2.RemainingStates)-1),
                    RIGHT(c2.RemainingStates,LEN(c2.RemainingStates) - CHARINDEX(',', c2.RemainingStates))
        FROM        CTE2    c2
        WHERE       LEN(c2.RemainingStates) > 0
    ),
    CTE3 AS
    (
        SELECT      Div, Dept,
                    IndividualState,
                    COUNT(*)            AS StateCount
        FROM        CTE2
        GROUP BY    Div, Dept, IndividualState
    ),
    CTE4 AS
    (
        SELECT      t1.Div, t1.Dept,
                    (
                        SELECT  c3.IndividualState + ' - (' + CONVERT(varchar(10),c3.StateCount) + '), ' 
                        FROM    CTE3 c3
                        WHERE   c3.Div = t1.Div AND c3.Dept = t1.Dept
                        FOR XML PATH('')
                    )       AS States
        FROM        Table1  t1
    )

SELECT  Div, Dept,
        LEFT(States, LEN(States) - 1) AS States
FROM    CTE4

Explanation

  1. CTE1 cleans up the data in Table2: remove spaces, add a comma to the end
  2. CTE1清理表2中的数据:删除空格,在末尾添加逗号

  3. CTE2 does the normalization
  4. CTE2进行标准化

  5. CTE3 does the counting
  6. CTE3进行计数

  7. CTE4 does the final assembly, putting CA | 3 into CA - (3), ...
  8. CTE4进行最终组装,放入CA | 3进CA - (3),...

The last SELECT remove the trailing comma for neater output.

最后一个SELECT删除neater输出的尾随逗号。

To better understand each step, you can replace the final SELECT with SELECT * FROM CTE1, SELECT * FROM CTE2, etc.

为了更好地理解每个步骤,可以用SELECT * FROM CTE1,SELECT * FROM CTE2等替换最终的SELECT。

#2


6  

Ok, so first of all, you'll need to split the concatenated values in #Temp1 and #Temp2. There are various methods for doing so, I'll use the numbers table one that is described in this awesome blog post from Aaron Bertrand. So, we'll need a numbers table, which can be done this way:

好的,首先,您需要在#Temp1和#Temp2中拆分连接值。这样做有多种方法,我将使用Aaron Bertrand这篇精彩博客文章中描述的数字表。所以,我们需要一个数字表,可以这样做:

;WITH n AS
(
    SELECT  x = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
    FROM sys.all_objects AS s1
    CROSS JOIN sys.all_objects AS s2
)
SELECT Number = x
INTO #Numbers
FROM n
WHERE x BETWEEN 1 AND 8000;

Then, you'll need to actually do the splitting and then a group concatenation method for your result:

然后,您需要实际执行拆分,然后为您的结果执行组连接方法:

;WITH T1 AS
(
    SELECT *
    FROM #Table1 T
    OUTER APPLY (SELECT Item = SUBSTRING(T.States, Number,
                                         CHARINDEX(',',T.States + ',', Number) - 
                                         Number)
                 FROM #Numbers
                 WHERE Number <= CONVERT(INT, LEN(T.States))
                 AND SUBSTRING(',' + T.States, Number, LEN(',')) = ',') N
), T2 AS
(
    SELECT *
    FROM #Table2 T
    OUTER APPLY (SELECT Item = SUBSTRING(T.States, Number,
                                         CHARINDEX(', ',T.States + ', ', Number) - 
                                         Number)
                 FROM #Numbers
                 WHERE Number <= CONVERT(INT, LEN(T.States))
                 AND SUBSTRING(', ' + T.States, Number, LEN(', ')) = ', ') N
), T3 AS
(
    SELECT T1.Div, T1.Dept, T1.Item, COUNT(*) N
    FROM T1 
    LEFT JOIN T2
        ON T1.Div = T2.Div
        AND T1.Dept = T2.Dept
        AND T1.Item = T2.Item
    GROUP BY T1.Div, T1.Dept, T1.Item
)
SELECT  A.Div, 
        A.Dept, 
        States = STUFF((SELECT  ',' + CONVERT(VARCHAR(20), Item) + 
                                ' - (' + CAST(N AS VARCHAR(4)) + ')'
                        FROM T3 
                        WHERE Div = A.Div
                        AND Dept = A.Dept
                    FOR XML PATH(''), TYPE).value('.[1]','nvarchar(max)'),1,1,'')
FROM T3 A
ORDER BY Div, Dept, Item

The results are:

结果是:

╔══════╦═══════╦════════════════════════════╗
║ Div  ║ Dept  ║           States           ║
╠══════╬═══════╬════════════════════════════╣
║ Div1 ║ Dept1 ║ CA - (3),NV - (2),TX - (2) ║
║ Div1 ║ Dept1 ║ CA - (3),NV - (2),TX - (2) ║
║ Div1 ║ Dept1 ║ CA - (3),NV - (2),TX - (2) ║
║ Div2 ║ Dept2 ║ IN - (1),MI - (2),OH - (2) ║
║ Div2 ║ Dept2 ║ IN - (1),MI - (2),OH - (2) ║
║ Div2 ║ Dept2 ║ IN - (1),MI - (2),OH - (2) ║
║ Div3 ║ Dept2 ║ NJ - (1),NY - (1),PA - (1) ║
║ Div3 ║ Dept2 ║ NJ - (1),NY - (1),PA - (1) ║
║ Div3 ║ Dept2 ║ NJ - (1),NY - (1),PA - (1) ║
║ Div4 ║ Dept1 ║ NULL                       ║
╚══════╩═══════╩════════════════════════════╝

#3


2  

Ideally of course this data would be normalized, because this is messy. Given that you're stuck with this structure, I think your best bet is to use a driver table of states or to use one of the many split functions available to split out the delimited values, and then use that, to get each state on it's own line:

理想情况下,这些数据当然会被标准化,因为这很麻烦。鉴于您坚持使用此结构,我认为您最好的选择是使用状态驱动程序表或使用可用的许多拆分函数之一来拆分分隔值,然后使用它来获取每个状态这是自己的路线:

;WITH cte AS (SELECT DISTINCT b.Div,b.Dept,a.abbrs 
                FROM #States a
                RIGHT JOIN #Table1 b
                 ON ','+REPLACE(b.States,' ','')+',' LIKE '%,'+a.abbrs+',%'
              )
     ,cte2 AS (SELECT b.Div,b.Dept,a.abbrs 
                FROM #States a
                JOIN #Table2 b
                 ON ','+REPLACE(b.States,' ','')+',' LIKE '%,'+a.abbrs+',%'
              )
     ,cte3 AS (SELECT a.Div,a.Dept,a.abbrs,CAST(COUNT(b.abbrs)AS VARCHAR(25)) CT 
                FROM  cte a
                LEFT JOIN cte2 b
                 ON a.Dept = b.Dept
                 AND a.Div = b.Div
                 AND a.abbrs = b.abbrs
                GROUP BY a.div,a.dept,a.abbrs
              )
SELECT DISTINCT
      Div,Dept
      ,STUFF((SELECT DISTINCT ',' +   abbrs+'-('+CT+')'
                                    FROM cte3 b                         
                                    WHERE a.Div = b.Div
                                      AND a.Dept = b.Dept
                                    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)') 
                                    ,1,1,'')

FROM  cte3 a

Demo: SQL Fiddle

演示:SQL小提琴

Note:

  • cte1 - creates distinct list of div/dept/state from table1
  • cte1 - 从table1创建不同的div / dept / state列表

  • cte2 - creates list of all div/dept/states from table2
  • cte2 - 从table2创建所有div / dept / states的列表

  • cte3 - aggregates on div/dept/state to get the count
  • cte3 - 在div / dept / state上聚合以获取计数

Output is:

|  DIV |  DEPT |               STATES |
|------|-------|----------------------|
| Div1 | Dept1 | CA-(3),NV-(2),TX-(2) |
| Div2 | Dept2 | IN-(1),MI-(2),OH-(2) |
| Div3 | Dept2 | NJ-(1),NY-(1),PA-(1) |
| Div4 | Dept1 |               (null) |

Updated fiddle to include your NULL line and added output.

更新了小提琴以包含您的NULL行和添加的输出。