如何生成具有指定长度的随机字母数字唯一字符

时间:2022-11-24 21:36:07

Problem is described below:

问题描述如下:

  1. Generate a unique alphanumeric characters.
  2. 生成唯一的字母数字字符。
  3. Length of characters should be 32.
  4. 字符长度应为32。
  5. Unique numbers may be seeded in the current time to help in the uniqueness of the generated numbers.
  6. 可以在当前时间播种唯一数字以帮助生成的数字的唯一性。
  7. Alphabet characters must come from this pool: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
  8. 字母字符必须来自此池:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

Sample Output: 445rpxlKYPkj1pg4q8nAy7Ab91zxZ8v1

样本输出:445rpxlKYPkj1pg4q8nAy7Ab91zxZ8v1

I can do this using Java, but will greatly appreciate if you could help me do this on MS SQL or T-SQL.

我可以使用Java来做到这一点,但是如果你能帮助我在MS SQL或T-SQL上做这件事,我将非常感激。

2 个解决方案

#1


3  

First, you need to split the string into separate rows. Then, do a SELECT with ORDER BY NEWID() for the random sort. Finally, use FOR XML PATH('') to concatenate them back:

首先,您需要将字符串拆分为单独的行。然后,使用ORDER BY NEWID()为随机排序执行SELECT。最后,使用FOR XML PATH('')将它们连接起来:

DECLARE @str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';

;WITH E1(N) AS( -- 10 ^ 1 = 10 rows
    SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
CteTally(N) AS(
    SELECT TOP(LEN(@str)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
    FROM E4
)
SELECT  (
    SELECT TOP(32)
        SUBSTRING(@str, N, 1)
    FROM CteTally t
    ORDER BY NEWID()
    FOR XML PATH('')
) AS Result

ONLINE DEMO

在线演示

The above is more of a generic random string generator. You can modify it to suit your need. If the requirement will not change, you can simply use this:

以上是更通用的随机字符串生成器。您可以根据需要进行修改。如果要求不会改变,您可以简单地使用:

DECLARE @str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 52 Rows
    SELECT 1 FROM( VALUES
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1)
    )t(N)
),
CteTally(N) AS(
    SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
    FROM E1
)
SELECT  (
    SELECT TOP(32)
        SUBSTRING(@str, N, 1)
    FROM CteTally t
    ORDER BY NEWID()
    FOR XML PATH('')
) AS Result

#2


1  

I made this generic enough to handle any pool of characters and any output length. The core idea is to take a random sequence of bytes and use a base conversion algorithm to convert a long number into a new representation then translated to a string using your desired characters as its "digits".

我使这个通用足以处理任何字符池和任何输出长度。核心思想是采用随机字节序列并使用基本转换算法将长数字转换为新表示,然后使用所需字符作为“数字”转换为字符串。

For your specific scenario we need about 183 bits, or log2(52) x 32, to get to your desired length. Using newid() will generate the unique bit sequence but it will only do so 128 bits at a time and a series of values is simply concatenated until there are enough. Then having a value to operate on, the main loop is essentially the same long division we learned from elementary school. The intermediate calculations are kept in place in the varbinary array and the loop continues only until enough output characters are obtained. Each iteration determines another low order digit in the new base and this can terminate early since they won't change. The algorithm can't guarantee any global uniqueness if the output doesn't consume at least all of one newid(), so make sure log2(len(pool)) x output length is at least 128.

对于您的特定情况,我们需要大约183位,或log2(52)x 32,以达到您想要的长度。使用newid()将生成唯一的位序列,但它一次只能执行128位,并且只需连接一系列值,直到有足够的值。然后有一个值来操作,主循环基本上是我们从小学学到的同样长的分工。中间计算在varbinary数组中保持不变,并且循环仅继续,直到获得足够的输出字符。每次迭代确定新基数中的另一个低位数字,这可以提前终止,因为它们不会改变。如果输出不至少消耗所有一个newid(),则算法不能保证任何全局唯一性,因此请确保log2(len(池))x输出长度至少为128。

The target base, which is ultimately the length of the character pool, can't be more than 256. I hard-coded a limitation by setting the 128-byte maximum length of @e. For the question @e only needs to be 32 bytes long and it could be adjusted upward or downward as necessary or just defined as varbinary(max). If you need something more truly random you could find another source for the entropy bits like crypt_gen_random(). Since uniqueness appears to be the primary concern this answer fits that requirement. And by the way, repeating characters in the pool will naturally open the door for collisions.

目标基数,最终是字符池的长度,不能超过256.我通过设置@e的128字节最大长度来硬编码限制。对于问题,@ e只需要32个字节长,可以根据需要向上或向下调整,或者只定义为varbinary(max)。如果你需要更真实随机的东西,你可以找到另一个熵位的来源,如crypt_gen_random()。由于唯一性似乎是主要关注点,因此这个答案符合这一要求。顺便说一下,游泳池中重复的角色自然会打开碰撞的大门。

This is fast and generic and it can be easily wrapped up in a function. And a more robust implementation would handle these extra checks.

这是快速和通用的,它可以很容易地包含在一个函数中。更强大的实现将处理这些额外的检查。

declare @characterPool varchar(256) =
    'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
declare @outputLength int = 32;

declare @n int = 0; /* counter */
declare @numLoops int = ceiling(log(len(@characterPool)) / log(2) * @outputLength / 128)
declare @e varbinary(128) = 0x; /* entropy */

while @n < @numLoops
begin
    set @e = cast(newid() as binary(16)); set @n += 1;
end

declare @b int; /* byte */
declare @d int; /* dividend */
declare @out varchar(128) = '';

declare @outputBase int = len(@characterPool);
declare @entropyBytes int = len(@e);

declare @m int = 0;
while @m < @outputLength
begin
    set @b = 0; set @d = 0; set @n = 0;
    while @n < @entropyBytes /* big-endian */
    begin
        set @b = (@b - @d * @outputBase) * 256 + cast(substring(@e, @n + 1, 1) as int);
        set @d = @b / @outputBase;
        set @e = cast(stuff(@e, @n + 1, 1, cast(@d as binary(1))) as varbinary(128));
        set @n += 1;
    end
    set @out = substring(@characterPool, @b - @d * @outputBase + 1, 1) + @out;
    set @m += 1;
end

select @out as "UniqueString"

http://rextester.com/EYAK79470

http://rextester.com/EYAK79470

As one simple test of the algorithm you could just assign a known value in hexadecimal format and confirm that the output (using 012345678ABCDEF as the character pool) is the same hexadecimal value. In the same way this obviously works with base64, binary and octal.

作为算法的一个简单测试,您只需分配十六进制格式的已知值,并确认输出(使用012345678ABCDEF作为字符池)是相同的十六进制值。以同样的方式,这显然适用于base64,二进制和八进制。

Update: The main loop can be made faster by not having to iterate over more bytes than necessary. I don't know how crypt_gen_random() compares to newid() in terms of speed or CPU usage so this change might not even be a net positive so I'll just note it as an alternative to explore. You will want to keep the bytes from newid on the little end and attach the rest to the front.

更新:通过不必迭代超过必要的字节,可以使主循环更快。我不知道crypt_gen_random()在速度或CPU使用方面如何与newid()进行比较,所以这个变化甚至可能不是净积极的,所以我只是把它作为探索的替代方案。您需要将newid中的字节保留在小端,并将其余部分连接到前端。

declare @e varbinary(1024) = cast(newid() as binary(16));
declare @padBytes int = ceiling(log(len(@characterPool)) / log(2) * @outputLength) - 128;
if @padBytes > 0 set @e = crypt_gen_random(@padBytes) + @e; /* big end plus little end */

#1


3  

First, you need to split the string into separate rows. Then, do a SELECT with ORDER BY NEWID() for the random sort. Finally, use FOR XML PATH('') to concatenate them back:

首先,您需要将字符串拆分为单独的行。然后,使用ORDER BY NEWID()为随机排序执行SELECT。最后,使用FOR XML PATH('')将它们连接起来:

DECLARE @str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';

;WITH E1(N) AS( -- 10 ^ 1 = 10 rows
    SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
CteTally(N) AS(
    SELECT TOP(LEN(@str)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
    FROM E4
)
SELECT  (
    SELECT TOP(32)
        SUBSTRING(@str, N, 1)
    FROM CteTally t
    ORDER BY NEWID()
    FOR XML PATH('')
) AS Result

ONLINE DEMO

在线演示

The above is more of a generic random string generator. You can modify it to suit your need. If the requirement will not change, you can simply use this:

以上是更通用的随机字符串生成器。您可以根据需要进行修改。如果要求不会改变,您可以简单地使用:

DECLARE @str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 52 Rows
    SELECT 1 FROM( VALUES
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
        (1),(1)
    )t(N)
),
CteTally(N) AS(
    SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
    FROM E1
)
SELECT  (
    SELECT TOP(32)
        SUBSTRING(@str, N, 1)
    FROM CteTally t
    ORDER BY NEWID()
    FOR XML PATH('')
) AS Result

#2


1  

I made this generic enough to handle any pool of characters and any output length. The core idea is to take a random sequence of bytes and use a base conversion algorithm to convert a long number into a new representation then translated to a string using your desired characters as its "digits".

我使这个通用足以处理任何字符池和任何输出长度。核心思想是采用随机字节序列并使用基本转换算法将长数字转换为新表示,然后使用所需字符作为“数字”转换为字符串。

For your specific scenario we need about 183 bits, or log2(52) x 32, to get to your desired length. Using newid() will generate the unique bit sequence but it will only do so 128 bits at a time and a series of values is simply concatenated until there are enough. Then having a value to operate on, the main loop is essentially the same long division we learned from elementary school. The intermediate calculations are kept in place in the varbinary array and the loop continues only until enough output characters are obtained. Each iteration determines another low order digit in the new base and this can terminate early since they won't change. The algorithm can't guarantee any global uniqueness if the output doesn't consume at least all of one newid(), so make sure log2(len(pool)) x output length is at least 128.

对于您的特定情况,我们需要大约183位,或log2(52)x 32,以达到您想要的长度。使用newid()将生成唯一的位序列,但它一次只能执行128位,并且只需连接一系列值,直到有足够的值。然后有一个值来操作,主循环基本上是我们从小学学到的同样长的分工。中间计算在varbinary数组中保持不变,并且循环仅继续,直到获得足够的输出字符。每次迭代确定新基数中的另一个低位数字,这可以提前终止,因为它们不会改变。如果输出不至少消耗所有一个newid(),则算法不能保证任何全局唯一性,因此请确保log2(len(池))x输出长度至少为128。

The target base, which is ultimately the length of the character pool, can't be more than 256. I hard-coded a limitation by setting the 128-byte maximum length of @e. For the question @e only needs to be 32 bytes long and it could be adjusted upward or downward as necessary or just defined as varbinary(max). If you need something more truly random you could find another source for the entropy bits like crypt_gen_random(). Since uniqueness appears to be the primary concern this answer fits that requirement. And by the way, repeating characters in the pool will naturally open the door for collisions.

目标基数,最终是字符池的长度,不能超过256.我通过设置@e的128字节最大长度来硬编码限制。对于问题,@ e只需要32个字节长,可以根据需要向上或向下调整,或者只定义为varbinary(max)。如果你需要更真实随机的东西,你可以找到另一个熵位的来源,如crypt_gen_random()。由于唯一性似乎是主要关注点,因此这个答案符合这一要求。顺便说一下,游泳池中重复的角色自然会打开碰撞的大门。

This is fast and generic and it can be easily wrapped up in a function. And a more robust implementation would handle these extra checks.

这是快速和通用的,它可以很容易地包含在一个函数中。更强大的实现将处理这些额外的检查。

declare @characterPool varchar(256) =
    'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
declare @outputLength int = 32;

declare @n int = 0; /* counter */
declare @numLoops int = ceiling(log(len(@characterPool)) / log(2) * @outputLength / 128)
declare @e varbinary(128) = 0x; /* entropy */

while @n < @numLoops
begin
    set @e = cast(newid() as binary(16)); set @n += 1;
end

declare @b int; /* byte */
declare @d int; /* dividend */
declare @out varchar(128) = '';

declare @outputBase int = len(@characterPool);
declare @entropyBytes int = len(@e);

declare @m int = 0;
while @m < @outputLength
begin
    set @b = 0; set @d = 0; set @n = 0;
    while @n < @entropyBytes /* big-endian */
    begin
        set @b = (@b - @d * @outputBase) * 256 + cast(substring(@e, @n + 1, 1) as int);
        set @d = @b / @outputBase;
        set @e = cast(stuff(@e, @n + 1, 1, cast(@d as binary(1))) as varbinary(128));
        set @n += 1;
    end
    set @out = substring(@characterPool, @b - @d * @outputBase + 1, 1) + @out;
    set @m += 1;
end

select @out as "UniqueString"

http://rextester.com/EYAK79470

http://rextester.com/EYAK79470

As one simple test of the algorithm you could just assign a known value in hexadecimal format and confirm that the output (using 012345678ABCDEF as the character pool) is the same hexadecimal value. In the same way this obviously works with base64, binary and octal.

作为算法的一个简单测试,您只需分配十六进制格式的已知值,并确认输出(使用012345678ABCDEF作为字符池)是相同的十六进制值。以同样的方式,这显然适用于base64,二进制和八进制。

Update: The main loop can be made faster by not having to iterate over more bytes than necessary. I don't know how crypt_gen_random() compares to newid() in terms of speed or CPU usage so this change might not even be a net positive so I'll just note it as an alternative to explore. You will want to keep the bytes from newid on the little end and attach the rest to the front.

更新:通过不必迭代超过必要的字节,可以使主循环更快。我不知道crypt_gen_random()在速度或CPU使用方面如何与newid()进行比较,所以这个变化甚至可能不是净积极的,所以我只是把它作为探索的替代方案。您需要将newid中的字节保留在小端,并将其余部分连接到前端。

declare @e varbinary(1024) = cast(newid() as binary(16));
declare @padBytes int = ceiling(log(len(@characterPool)) / log(2) * @outputLength) - 128;
if @padBytes > 0 set @e = crypt_gen_random(@padBytes) + @e; /* big end plus little end */