将SQL Server varbinary(max)转换为int类型的主键集

时间:2022-06-16 00:55:29

Disclaimer: not my code, not my database design!

免责声明:不是我的代码,不是我的数据库设计!

I have a column of censusblocks(varbinary(max), null) in a MS SQL Server 2008 db table (call it foo for simplicity).

我在MS SQL Server 2008数据库表中有一列censusblocks(varbinary(max),null)(为简单起见,将其称为foo)。

This column is actually a null or 1 to n long list of int. The ints are actually foreign keys to another table (call it censusblock with a pk id of type of int), numbering from 1 to ~9600000.

此列实际上是null或1到n长的int列表。 int实际上是另一个表的外键(称为censusblock,其类型为int的pk id),编号从1到~60000000。

I want to query to extract the censusblocks list from foo, and use the extracted list of int from each row to look up the corresponding censusblock row. There's a long, boring rest of the query that will be used from there, but it needs to start with the census blocks pulled from the foo table's censusblocks column.

我想查询从foo中提取censusblocks列表,并使用每行中提取的int列表来查找相应的censusblock行。从那里开始使用的查询有一个冗长乏味的其余部分,但它需要从foo表的censusblocks列中提取的人口普查块开始。

This conversion-and-look-up is currently handled on the middle tier, with a small .NET utility class to convert from List<int> to byte[] (and vice versa), which is then written into/read from the db as varbinary. I would like to do the same thing, purely in SQL.

此转换和查找当前在中间层处理,使用一个小的.NET实用程序类从List 转换为byte [](反之亦然),然后将其写入/读取db作为varbinary。我想做同样的事情,纯粹是在SQL中。

The desired query would go something along the lines of

期望的查询将有所作为

SELECT f.id, c.id
FROM foo f 
LEFT OUTER JOIN censusblock c ON 
c.id IN f.censusblocks --this is where the magic happens
where f.id in (1,2)

Which would result in:

这将导致:

f.id   |   c.id

 1         8437314
 1         8438819
 1         8439744
 1         8441795
 1         8442741
 1         8444984
 1         8445568
 1         8445641
 1         8447953
 2         5860657
 2         5866881
 2         5866881
 2         5866858
 2         5862557
 2         5870475
 2         5868983
 2         5865207
 2         5863465
 2         5867301
 2         5864057
 2         5862256

NB: the 7-digit results are coincidental. The range is, as stated above, 1-7 digits.

注意:7位数的结果是巧合的。如上所述,范围是1-7位。

The actual censusblocks column looks like

实际的censusblocks列看起来像

SELECT TOP 2 censusblocks FROM foo

which results in

结果

censublocks

0x80BE4280C42380C7C080CFC380D37580DC3880DE8080DEC980E7D1
0x596D3159858159856A59749D59938B598DB7597EF7597829598725597A79597370

For further clarification, here's the guts of the .NET utility classes conversion methods:

为了进一步说明,这里是.NET实用程序类转换方法的内容:

    public static List<int> getIntegersFromBytes(byte[] data)
    {
        List<int> values = new List<int>();
        if (data != null && data.Length > 2)
        {
            long ids = data.Length / 3;
            byte[] oneId = new byte[4];
            oneId[0] = 0;
            for (long i = 0; i < ids; i++)
            {
                oneId[0] = 0;
                Array.Copy(data, i * 3, oneId, 1, 3);
                if (BitConverter.IsLittleEndian)
                { Array.Reverse(oneId); }
                values.Add(BitConverter.ToInt32(oneId, 0));
            }}
        return values;
    }

    public static byte[] getBytesFromIntegers(List<int> values)
    {
        byte[] data = null;
        if (values != null && values.Count > 0)
        {
            data = new byte[values.Count * 3];
            int count = 0;
            byte[] idBytes = null;
            foreach (int id in values)
            {
                idBytes = BitConverter.GetBytes(id);
                if (BitConverter.IsLittleEndian)
                { Array.Reverse(idBytes); }
                Array.Copy(idBytes, 1, data, count * 3, 3);
                count++;
            } }
        return data;
    }

3 个解决方案

#1


3  

An example of how this might be done. It is unlikely to scale brilliantly. If you have a numbers table in your database it should be used in place of nums_cte.

如何做到这一点的一个例子。它不太可能出色地扩展。如果数据库中有数字表,则应使用它来代替nums_cte。

This works by converting the binary value to a literal hex string, then reading it in 8-character chunks

这通过将二进制值转换为文字十六进制字符串,然后以8个字符的块读取它来工作

-- create test data
DECLARE @foo TABLE
(id int ,
 censusblocks varbinary(max)
)

DECLARE @censusblock TABLE
(id int)

INSERT @censusblock (id)
VALUES(1),(2),(1003),(5030),(5031),(2),(6)


INSERT @foo (id,censusblocks)
VALUES (1,0x0000000100000002000003EB),
(2,0x000013A6000013A7)


--query
DECLARE @biMaxLen bigint
SELECT @biMaxLen = MAX(LEN(CONVERT(varchar(max),censusblocks,2))) FROM @foo

;with nums_cte
AS
(
    SELECT TOP (@biMaxLen) ((ROW_NUMBER() OVER (ORDER BY a.type) - 1) * 8) AS n
    FROM master..spt_values as a
    CROSS JOIN master..spt_values as b
)
,binCTE
AS
(
    SELECT d.id, CAST(CONVERT(binary(4),SUBSTRING(s,n + 1,8),2) AS int) as cblock
    FROM (SELECT Id, CONVERT(varchar(max),censusblocks,2) AS s FROM @foo) AS d
    JOIN nums_cte
    ON n < LEN(d.s)
)
SELECT *
FROM    binCTE as b
LEFT
JOIN    @censusblock c
ON      c.id = b.cblock
ORDER BY b.id, b.cblock

You could also consider adding your existing .Net conversion methods into the database as an assembly and accessing them through CLR functions.

您还可以考虑将现有的.Net转换方法作为程序集添加到数据库中,并通过CLR函数访问它们。

#2


1  

This is off-topic, but I couldn't resist writing these conversions so they use IEnumerables instead of arrays and Lists. This might not be faster per se, but is more general and would allow you to perform the conversion without loading the whole array at once, which may be helpful if the arrays you are dealing with are large.

这是偏离主题的,但我无法抗拒编写这些转换,因此他们使用IEnumerables而不是数组和列表。这本身可能不会更快,但更通用,并且允许您在不加载整个数组的情况下执行转换,如果您正在处理的数组很大,这可能会有所帮助。

Here it is, for what it's worth:

这是它的价值所在:

static IEnumerable<int> BytesToInts(IEnumerable<byte> bytes) {

    var buff = new byte[4];

    using (var en = bytes.GetEnumerator()) {

        while (en.MoveNext()) {

            buff[0] = en.Current;
            if (en.MoveNext()) {
                buff[1] = en.Current;
                if (en.MoveNext()) {
                    buff[2] = en.Current;
                    if (en.MoveNext()) {
                        buff[3] = en.Current;
                        if (BitConverter.IsLittleEndian)
                            Array.Reverse(buff);
                        yield return BitConverter.ToInt32(buff, 0);
                        continue;
                    }
                }
            }

            throw new ArgumentException("Wrong number of bytes.", "bytes");

        }

    }

}

static IEnumerable<byte> IntsToBytes(IEnumerable<int> ints) {

    if (BitConverter.IsLittleEndian)
        return ints.SelectMany(
            b => {
                var buff = BitConverter.GetBytes(b);
                Array.Reverse(buff);
                return buff;
            }
        );

    return ints.SelectMany(BitConverter.GetBytes);

}

Your code seems to like encoding an int into 3 bytes instead of 4, which would cause problems with values that don't fit into 3 bytes (including negatives) - is that intentional?

您的代码似乎喜欢将int编码为3个字节而不是4个,这会导致不适合3个字节(包括底片)的值出现问题 - 这是故意的吗?

BTW, you should be able to adapt this (or your) code for execution in SQL Server CLR. This is not exactly "in SQL", but is "in DBMS".

顺便说一句,您应该能够调整此(或您的)代码以便在SQL Server CLR中执行。这不完全是“在SQL中”,而是“在DBMS中”。

#3


-1  

you can use Convert(int, censusBlock) to convert the varchar value to int value.
the you can join on that column. Or have i misunderstood the question?

您可以使用Convert(int,censusBlock)将varchar值转换为int值。您可以加入该列。或者我误解了这个问题?

#1


3  

An example of how this might be done. It is unlikely to scale brilliantly. If you have a numbers table in your database it should be used in place of nums_cte.

如何做到这一点的一个例子。它不太可能出色地扩展。如果数据库中有数字表,则应使用它来代替nums_cte。

This works by converting the binary value to a literal hex string, then reading it in 8-character chunks

这通过将二进制值转换为文字十六进制字符串,然后以8个字符的块读取它来工作

-- create test data
DECLARE @foo TABLE
(id int ,
 censusblocks varbinary(max)
)

DECLARE @censusblock TABLE
(id int)

INSERT @censusblock (id)
VALUES(1),(2),(1003),(5030),(5031),(2),(6)


INSERT @foo (id,censusblocks)
VALUES (1,0x0000000100000002000003EB),
(2,0x000013A6000013A7)


--query
DECLARE @biMaxLen bigint
SELECT @biMaxLen = MAX(LEN(CONVERT(varchar(max),censusblocks,2))) FROM @foo

;with nums_cte
AS
(
    SELECT TOP (@biMaxLen) ((ROW_NUMBER() OVER (ORDER BY a.type) - 1) * 8) AS n
    FROM master..spt_values as a
    CROSS JOIN master..spt_values as b
)
,binCTE
AS
(
    SELECT d.id, CAST(CONVERT(binary(4),SUBSTRING(s,n + 1,8),2) AS int) as cblock
    FROM (SELECT Id, CONVERT(varchar(max),censusblocks,2) AS s FROM @foo) AS d
    JOIN nums_cte
    ON n < LEN(d.s)
)
SELECT *
FROM    binCTE as b
LEFT
JOIN    @censusblock c
ON      c.id = b.cblock
ORDER BY b.id, b.cblock

You could also consider adding your existing .Net conversion methods into the database as an assembly and accessing them through CLR functions.

您还可以考虑将现有的.Net转换方法作为程序集添加到数据库中,并通过CLR函数访问它们。

#2


1  

This is off-topic, but I couldn't resist writing these conversions so they use IEnumerables instead of arrays and Lists. This might not be faster per se, but is more general and would allow you to perform the conversion without loading the whole array at once, which may be helpful if the arrays you are dealing with are large.

这是偏离主题的,但我无法抗拒编写这些转换,因此他们使用IEnumerables而不是数组和列表。这本身可能不会更快,但更通用,并且允许您在不加载整个数组的情况下执行转换,如果您正在处理的数组很大,这可能会有所帮助。

Here it is, for what it's worth:

这是它的价值所在:

static IEnumerable<int> BytesToInts(IEnumerable<byte> bytes) {

    var buff = new byte[4];

    using (var en = bytes.GetEnumerator()) {

        while (en.MoveNext()) {

            buff[0] = en.Current;
            if (en.MoveNext()) {
                buff[1] = en.Current;
                if (en.MoveNext()) {
                    buff[2] = en.Current;
                    if (en.MoveNext()) {
                        buff[3] = en.Current;
                        if (BitConverter.IsLittleEndian)
                            Array.Reverse(buff);
                        yield return BitConverter.ToInt32(buff, 0);
                        continue;
                    }
                }
            }

            throw new ArgumentException("Wrong number of bytes.", "bytes");

        }

    }

}

static IEnumerable<byte> IntsToBytes(IEnumerable<int> ints) {

    if (BitConverter.IsLittleEndian)
        return ints.SelectMany(
            b => {
                var buff = BitConverter.GetBytes(b);
                Array.Reverse(buff);
                return buff;
            }
        );

    return ints.SelectMany(BitConverter.GetBytes);

}

Your code seems to like encoding an int into 3 bytes instead of 4, which would cause problems with values that don't fit into 3 bytes (including negatives) - is that intentional?

您的代码似乎喜欢将int编码为3个字节而不是4个,这会导致不适合3个字节(包括底片)的值出现问题 - 这是故意的吗?

BTW, you should be able to adapt this (or your) code for execution in SQL Server CLR. This is not exactly "in SQL", but is "in DBMS".

顺便说一句,您应该能够调整此(或您的)代码以便在SQL Server CLR中执行。这不完全是“在SQL中”,而是“在DBMS中”。

#3


-1  

you can use Convert(int, censusBlock) to convert the varchar value to int value.
the you can join on that column. Or have i misunderstood the question?

您可以使用Convert(int,censusBlock)将varchar值转换为int值。您可以加入该列。或者我误解了这个问题?