如何清理这个Sql数据?

时间:2022-08-29 22:19:22

this is a follow on question to a previously asked question.

这是对先前提出的问题的后续问题。

I have the following data in a single db table.

我在一个db表中有以下数据。

Name                LeftId    RightId
------------------------------------------
Cat                     1  
Cat                     1
Dog                     2
Dog                     2
Dog                               3
Dog                               3
Gerbil                  4         5 
Cat                
Bird
Cow                     6
Cow
Cow                               7
Dog                     8         9

Note that some rows have NO data for for LeftId and RightId.

请注意,对于LeftId和RightId,某些行没有数据。

Now, what i wish to do is find two different queries

现在,我想做的是找到两个不同的查询

  1. All rows which have at least 1 Id in one of the two columns AND row with NO data in both of those two Id columns.
  2. 在两个列中的一列中具有至少1个Id的所有行和在这两个Id列中都没有数据的行。

eg.

Cat     1
Cow     6 (or 7 .. i'm not worried)
  1. All the rows where LeftId and RightId are NULL grouped by the same name. If another row (with the same name) has a value in the LeftId or RightId, this this name will not be returned.
  2. LeftId和RightId为NULL的所有行按相同名称分组。如果另一行(具有相同名称)在LeftId或RightId中具有值,则不会返回此名称。

eg.

Bird

hmm..

EDIT: Reworded the first question properly.

编辑:正确地重新提出第一个问题。

5 个解决方案

#1


For the first query, you want rows that answer to both of the following criteria:

对于第一个查询,您需要符合以下两个条件的行:

  1. The Name in the row appears in the table in the same row in which LeftId and RightId are both NULL.
  2. 行中的名称出现在LeftId和RightId都为NULL的同一行的表中。

  3. The Name in the row appears in the table in same row where at at least one of LeftId and RightId is not NULL.
  4. 行中的名称出现在同一行的表中,其中LeftId和RightId中的至少一个不为NULL。

Well, #1 is done by:

那么,#1是通过以下方式完成的:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)

And #2 is done by:

而#2是通过以下方式完成的:

SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

You could intersect them to see which Names appear in both lists:

您可以将它们相交以查看两个列表中显示的名称:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
INTERSECT
SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

Which returns:

Name
----
Cat
Cow

But you want the LeftId and RightId, and you don't care which, so I guess we'll aggregate on the Name:

但是你想要LeftId和RightId,你不关心哪个,所以我想我们将在名称上聚合:

SELECT Name, MIN(LeftId) AS LeftId, MIN(RightId) AS RightId 
    FROM Tbl WHERE Tbl.Name IN (
      SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
      INTERSECT
      SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)
    )
GROUP BY Name

Which returns

Name  LeftId  RightId
----  ------  -------
Cat   1
Cow   6       7

lc already suggested using COALESE to turn those two IDs to a single one. So how about this:

lc已经建议使用COALESE将这两个ID转换为单个ID。那怎么样:

SELECT Name, COALESCE(MIN(LeftId),MIN(RightId)) AS Id 
    FROM Tbl WHERE Tbl.Name IN (
      SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
      INTERSECT
      SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)
    )
GROUP BY Name

Which returns:

Name  Id
----  --
Cat   1
Cow   6

For the second query, you want rows that obey the following criteria:

对于第二个查询,您需要符合以下条件的行:

  1. The Name appears only in rows that have no LeftId and RightId
  2. 名称仅出现在没有LeftId和RightId的行中

I can't think of a way to do that sort of self-referencing query in SQL in a single set of criteria, so I'll break it down to two criteria. Both must be obeyed to be acceptable:

我想不出用一种标准在SQL中进行那种自引用查询的方法,所以我将其分解为两个标准。两者都必须服从才能被接受:

  1. The Name appears in rows that have no LeftId and RightId
  2. 名称显示在没有LeftId和RightId的行中

  3. The Name does not appear in rows that have either LeftId or RightId
  4. 名称不会出现在具有LeftId或RightId的行中

Doing #1 is simply:

做#1只是:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)

But #2 is tricky. Of course doing the opposite of #2 ("all the Name that appear in rows that have either LeftId or RightId) is just like before:

但#2很棘手。当然与#2相反(“出现在具有LeftId或RightId的行中的所有名称)就像之前一样:

SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

Now comes the tricky bit - we want all the rows that obey #1 but don't obey the opposite of #2. This is where EXCEPT is useful:

现在是棘手的一点 - 我们希望所有符合#1的行但不遵守#2的相反行。这是EXCEPT非常有用的地方:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
EXCEPT
SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

Which returns:

Name
----
Bird

Which is what we wanted!

这就是我们想要的!

#2


Query 1)

SELECT * 
FROM Table 
WHERE (LeftID IS NULL AND RightID IS NOT NULL) 
    OR (LeftID IS NOT NULL AND RightID IS NULL)

Query 2)

SELECT * 
FROM Table 
WHERE LeftID IS NULL AND RightID IS NULL

#3


Query 1:

SELECT [Name], [LeftID], [RightID]

FROM [TestTable]

WHERE -- "All rows which have at least 1 Id in one of the two columns"
      ([LeftID] IS NOT NULL OR [RightID] IS NOT NULL)
      OR
      -- "Rows with NO data in both of those two Id columns"
      ([LeftID] IS NULL AND [RightID] IS NULL)

Query 2:

SELECT [Name], [LeftID], [RightID]

FROM [TestTable]

WHERE -- "All the rows where LeftId and RightId are NULL
      -- grouped by the same name"
      ([LeftID] IS NULL AND [RightID] IS NULL)
      AND
      -- "If another row (with the same name) has a value
      -- in the LeftId or RightId, this name will not be returned"    
      ([Name] NOT IN (SELECT DISTINCT [Name] FROM [TestTable]
                      WHERE [LeftID] IS NOT NULL
                            OR
                            [RightID] IS NOT NULL))

GROUP BY [Name], [LeftID], [RightID]

Results:

Name                                               LeftID      RightID
-------------------------------------------------- ----------- -----------
Cat                                                1           NULL
Cat                                                1           NULL
Dog                                                2           NULL
Dog                                                2           NULL
Dog                                                NULL        3
Dog                                                NULL        3
Gerbil                                             4           5
Cat                                                NULL        NULL
Bird                                               NULL        NULL
Cow                                                6           NULL
Cow                                                NULL        NULL
Cow                                                NULL        7
Dog                                                8           9

(13 row(s) affected)

Name                                               LeftID      RightID
-------------------------------------------------- ----------- -----------
Bird                                               NULL        NULL

(1 row(s) affected)

#4


If I understand you correctly then this is fairly trivial:

如果我理解正确,那么这是相当微不足道的:

1:

SELECT * 
FROM your_table 
WHERE (LeftId IS NOT NULL 
AND RightId IS NULL)
OR
(LeftId IS NULL 
AND RightId IS NOT NULL)

2:

SELECT * 
FROM your_table
WHERE 
    NOT EXISTS 
              (SELECT * FROM your_table y1
               WHERE (y1.LeftId IS NOT NULL OR y1.RightId IS NOT NULL)
               AND y1.name = your_table.name)

If this isn't right then perhaps you could clarify.

如果这不对,那么也许你可以澄清一下。

Edit: updated

#5


I hope I understand you correctly here.

我希望我在这里正确理解你。

Query 1:

SELECT t1.Name, COALESCE(MIN(t1.LeftID), MIN(t1.RightID))
FROM Table t1
WHERE EXISTS(SELECT t2.Name
             FROM Table t2
             WHERE t2.Name = t1.Name 
             AND   t2.LeftID IS NULL AND t2.RightID IS NULL)
AND   COALESCE(MIN(t1.LeftID), MIN(t1.RightID)) IS NOT NULL
GROUP BY t1.Name

Query 2:

SELECT t1.Name
FROM Table t1
WHERE NOT EXISTS(SELECT t2.Name
                 FROM Table t2
                 WHERE t2.Name = t1.Name
                 AND   (t2.LeftID IS NOT NULL OR t2.RightID IS NOT NULL))

#1


For the first query, you want rows that answer to both of the following criteria:

对于第一个查询,您需要符合以下两个条件的行:

  1. The Name in the row appears in the table in the same row in which LeftId and RightId are both NULL.
  2. 行中的名称出现在LeftId和RightId都为NULL的同一行的表中。

  3. The Name in the row appears in the table in same row where at at least one of LeftId and RightId is not NULL.
  4. 行中的名称出现在同一行的表中,其中LeftId和RightId中的至少一个不为NULL。

Well, #1 is done by:

那么,#1是通过以下方式完成的:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)

And #2 is done by:

而#2是通过以下方式完成的:

SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

You could intersect them to see which Names appear in both lists:

您可以将它们相交以查看两个列表中显示的名称:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
INTERSECT
SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

Which returns:

Name
----
Cat
Cow

But you want the LeftId and RightId, and you don't care which, so I guess we'll aggregate on the Name:

但是你想要LeftId和RightId,你不关心哪个,所以我想我们将在名称上聚合:

SELECT Name, MIN(LeftId) AS LeftId, MIN(RightId) AS RightId 
    FROM Tbl WHERE Tbl.Name IN (
      SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
      INTERSECT
      SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)
    )
GROUP BY Name

Which returns

Name  LeftId  RightId
----  ------  -------
Cat   1
Cow   6       7

lc already suggested using COALESE to turn those two IDs to a single one. So how about this:

lc已经建议使用COALESE将这两个ID转换为单个ID。那怎么样:

SELECT Name, COALESCE(MIN(LeftId),MIN(RightId)) AS Id 
    FROM Tbl WHERE Tbl.Name IN (
      SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
      INTERSECT
      SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)
    )
GROUP BY Name

Which returns:

Name  Id
----  --
Cat   1
Cow   6

For the second query, you want rows that obey the following criteria:

对于第二个查询,您需要符合以下条件的行:

  1. The Name appears only in rows that have no LeftId and RightId
  2. 名称仅出现在没有LeftId和RightId的行中

I can't think of a way to do that sort of self-referencing query in SQL in a single set of criteria, so I'll break it down to two criteria. Both must be obeyed to be acceptable:

我想不出用一种标准在SQL中进行那种自引用查询的方法,所以我将其分解为两个标准。两者都必须服从才能被接受:

  1. The Name appears in rows that have no LeftId and RightId
  2. 名称显示在没有LeftId和RightId的行中

  3. The Name does not appear in rows that have either LeftId or RightId
  4. 名称不会出现在具有LeftId或RightId的行中

Doing #1 is simply:

做#1只是:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)

But #2 is tricky. Of course doing the opposite of #2 ("all the Name that appear in rows that have either LeftId or RightId) is just like before:

但#2很棘手。当然与#2相反(“出现在具有LeftId或RightId的行中的所有名称)就像之前一样:

SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

Now comes the tricky bit - we want all the rows that obey #1 but don't obey the opposite of #2. This is where EXCEPT is useful:

现在是棘手的一点 - 我们希望所有符合#1的行但不遵守#2的相反行。这是EXCEPT非常有用的地方:

SELECT Name FROM Tbl WHERE (LeftId IS NULL) AND (RightId IS NULL)
EXCEPT
SELECT Name FROM Tbl WHERE (LeftId IS NOT NULL) OR (RightId IS NOT NULL)

Which returns:

Name
----
Bird

Which is what we wanted!

这就是我们想要的!

#2


Query 1)

SELECT * 
FROM Table 
WHERE (LeftID IS NULL AND RightID IS NOT NULL) 
    OR (LeftID IS NOT NULL AND RightID IS NULL)

Query 2)

SELECT * 
FROM Table 
WHERE LeftID IS NULL AND RightID IS NULL

#3


Query 1:

SELECT [Name], [LeftID], [RightID]

FROM [TestTable]

WHERE -- "All rows which have at least 1 Id in one of the two columns"
      ([LeftID] IS NOT NULL OR [RightID] IS NOT NULL)
      OR
      -- "Rows with NO data in both of those two Id columns"
      ([LeftID] IS NULL AND [RightID] IS NULL)

Query 2:

SELECT [Name], [LeftID], [RightID]

FROM [TestTable]

WHERE -- "All the rows where LeftId and RightId are NULL
      -- grouped by the same name"
      ([LeftID] IS NULL AND [RightID] IS NULL)
      AND
      -- "If another row (with the same name) has a value
      -- in the LeftId or RightId, this name will not be returned"    
      ([Name] NOT IN (SELECT DISTINCT [Name] FROM [TestTable]
                      WHERE [LeftID] IS NOT NULL
                            OR
                            [RightID] IS NOT NULL))

GROUP BY [Name], [LeftID], [RightID]

Results:

Name                                               LeftID      RightID
-------------------------------------------------- ----------- -----------
Cat                                                1           NULL
Cat                                                1           NULL
Dog                                                2           NULL
Dog                                                2           NULL
Dog                                                NULL        3
Dog                                                NULL        3
Gerbil                                             4           5
Cat                                                NULL        NULL
Bird                                               NULL        NULL
Cow                                                6           NULL
Cow                                                NULL        NULL
Cow                                                NULL        7
Dog                                                8           9

(13 row(s) affected)

Name                                               LeftID      RightID
-------------------------------------------------- ----------- -----------
Bird                                               NULL        NULL

(1 row(s) affected)

#4


If I understand you correctly then this is fairly trivial:

如果我理解正确,那么这是相当微不足道的:

1:

SELECT * 
FROM your_table 
WHERE (LeftId IS NOT NULL 
AND RightId IS NULL)
OR
(LeftId IS NULL 
AND RightId IS NOT NULL)

2:

SELECT * 
FROM your_table
WHERE 
    NOT EXISTS 
              (SELECT * FROM your_table y1
               WHERE (y1.LeftId IS NOT NULL OR y1.RightId IS NOT NULL)
               AND y1.name = your_table.name)

If this isn't right then perhaps you could clarify.

如果这不对,那么也许你可以澄清一下。

Edit: updated

#5


I hope I understand you correctly here.

我希望我在这里正确理解你。

Query 1:

SELECT t1.Name, COALESCE(MIN(t1.LeftID), MIN(t1.RightID))
FROM Table t1
WHERE EXISTS(SELECT t2.Name
             FROM Table t2
             WHERE t2.Name = t1.Name 
             AND   t2.LeftID IS NULL AND t2.RightID IS NULL)
AND   COALESCE(MIN(t1.LeftID), MIN(t1.RightID)) IS NOT NULL
GROUP BY t1.Name

Query 2:

SELECT t1.Name
FROM Table t1
WHERE NOT EXISTS(SELECT t2.Name
                 FROM Table t2
                 WHERE t2.Name = t1.Name
                 AND   (t2.LeftID IS NOT NULL OR t2.RightID IS NOT NULL))