生成人类可读/可用、短但惟一的id

时间:2022-11-24 20:48:46
  • Need to handle > 1000 but < 10000 new records per day

    需要处理> 1000,但每天要处理< 10000个新记录。

  • Cannot use GUID/UUIDs, auto increment numbers etc.

    不能使用GUID/ uuid、自动增量号等。

  • Ideally should be 5 or 6 chars long, can be alpha of course

    理想情况下应该是5到6个字符长,当然可以是

  • Would like to reuse existing, well-known algos, if available

    想要重用现有的、知名的algos吗

Anything out there ?

什么呢?

4 个解决方案

#1


78  

Base 62 is used by tinyurl and bit.ly for the abbreviated URLs. It's a well-understood method for creating "unique", human-readable IDs. Of course you will have to store the created IDs and check for duplicates on creation to ensure uniqueness. (See code at bottom of answer)

Base 62被tinyurl和bit使用。ly对缩写的url。它是创建“惟一”、人类可读id的一种很容易理解的方法。当然,您必须存储已创建的id,并在创建时检查副本,以确保惟一性。(见下面的代码)

Base 62 uniqueness metrics

基地62独特性度量

5 chars in base 62 will give you 62^5 unique IDs = 916,132,832 (~1 billion) At 10k IDs per day you will be ok for 91k+ days

5字符基地62给你62 ^ 5惟一id = 916132832(~ 10亿)10 k IDs每天你会好91 k +天

6 chars in base 62 will give you 62^6 unique IDs = 56,800,235,584 (56+ billion) At 10k IDs per day you will be ok for 5+ million days

6字符基地62给你62 ^ 6惟一id = 56800235584(56 +十亿)10 k IDs每天你都会好的5 +百万天

Base 36 uniqueness metrics

基地36独特性度量

6 chars will give you 36^6 unique IDs = 2,176,782,336 (2+ billion)

6字符会给你36 ^ 6惟一id = 2176782336(2 +十亿)

7 chars will give you 36^7 unique IDs = 78,364,164,096 (78+ billion)

7字符会给你36 ^ 7惟一id = 78364164096(+ 78)

Code:

代码:

public void TestRandomIdGenerator()
{
    // create five IDs of six, base 62 characters
    for (int i=0; i<5; i++) Console.WriteLine(RandomIdGenerator.GetBase62(6));

    // create five IDs of eight base 36 characters
    for (int i=0; i<5; i++) Console.WriteLine(RandomIdGenerator.GetBase36(8));
}

public static class RandomIdGenerator 
{
    private static char[] _base62chars = 
        "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
        .ToCharArray();

    private static Random _random = new Random();

    public static string GetBase62(int length) 
    {
        var sb = new StringBuilder(length);

        for (int i=0; i<length; i++) 
            sb.Append(_base62chars[_random.Next(62)]);

        return sb.ToString();
    }       

    public static string GetBase36(int length) 
    {
        var sb = new StringBuilder(length);

        for (int i=0; i<length; i++) 
            sb.Append(_base62chars[_random.Next(36)]);

        return sb.ToString();
    }
}

Output:

输出:

z5KyMg
wd4SUp
uSzQtH
UPrGAT
UIf2IS

QCF9GNM5
0UV3TFSS
3MG91VKP
7NTRF10T
AJK3AJU7

#2


13  

I recommend http://hashids.org/ which converts any number (e.g. DB ID) into a string (using salt).

我推荐http://hashids.org/,它可以将任何数字(例如DB ID)转换成字符串(使用salt)。

It allows decoding this string back to the number. So you don't need to store it in the database.

它允许将这个字符串解码回数字。所以你不需要将它存储在数据库中。

Has libs for JavaScript, Ruby, Python, Java, Scala, PHP, Perl, Swift, Clojure, Objective-C, C, C++11, Go, Erlang, Lua, Elixir, ColdFusion, Groovy, Kotlin, Nim, VBA, CoffeeScript and for Node.js & .NET.

有用于JavaScript, Ruby, Python, Java, Scala, PHP, Perl, Swift, Clojure, Objective-C, C, c++ 11, Go, Erlang, Lua, Elixir, ColdFusion, Groovy, Kotlin, Nim, VBA, CoffeeScript和Node。js和。net。

#3


5  

I had similar requirements as the OP. I looked into available libraries but most of them are based on randomness and I didn't want that. I could not really find anything that was not based on random and still very short... So I ended up rolling my own based on the technique Flickr uses, but modified to require less coordination and allow for longer periods offline.

我和opp有相似的要求。我查看了可用的库,但是大多数库都是基于随机性的,我不希望这样。我真的找不到任何不是基于随机还是非常短的东西……因此,我最终基于Flickr使用的技术推出了自己的版本,但修改后需要更少的协调,并允许更长的离线时间。

In short:

简而言之:

  • A central server issues ID blocks consisting of 32 IDs each
  • *服务器发出由32个ID组成的ID块
  • The local ID generator maintains a pool of ID blocks to generate an ID every time one is requested. When the pool runs low it fetches more ID blocks from the server to fill it up again.
  • 本地ID生成器维护一个ID块池,以便在每次请求ID时生成一个ID。当池运行低时,它从服务器获取更多的ID块以再次填充它。

Disadvantages:

缺点:

  • Requires central coordination
  • 需要*协调
  • IDs are more or less predictable (less so than regular DB ids but they aren't random)
  • id或多或少是可预测的(比普通的DB id要少,但它们不是随机的)

Advantages

优势

  • Stays within 53 bits (Javascript / PHP max size for integer numbers)
  • 保持在53位以内(对于整数,Javascript / PHP最大大小)
  • very short IDs
  • 很短的id
  • Base 36 encoded so very easy for humans to read, write and pronounce
  • Base 36编码非常容易被人类阅读、书写和发音
  • IDs can be generated locally for a very long time before needing contact with the server again (depending on pool settings)
  • 在需要再次与服务器联系之前(取决于池设置),IDs可以在本地生成很长一段时间
  • Theoretically no chance of collissions
  • 理论上不可能发生碰撞

I have published both a Javascript library for the client side, as well as a Java EE server implementation. Implementing servers in other languages should be easy as well.

我已经为客户端发布了一个Javascript库,以及一个Java EE服务器实现。用其他语言实现服务器也应该很容易。

Here are the projects:

这是项目:

suid - Distributed Service-Unique IDs that are short and sweet

suid—分布式服务—独特的、简短而甜蜜的id

suid-server-java - Suid-server implementation for the Java EE technology stack.

Java EE技术栈的Suid-server - Java - Suid-server实现。

Both libraries are available under a liberal Creative Commons open source license. Hoping this may help someone else looking for short unique IDs.

这两个库都可以在一个*的知识共享开源许可证下使用。希望这能帮助其他人寻找短的唯一id。

#4


2  

I used base 36 when I solved this problem for an application I was developing a couple of years back. I needed to generate a human readable reasonably unique number (within the current calendar year anyway). I chose to use the time in milliseconds from midnight on Jan 1st of the current year (so each year, the timestamps could duplicate) and convert it to a base 36 number. If the system being developed ran into a fatal issue it generated the base 36 number (7 chars) that was displayed to an end user via the web interface who could then relay the issue encountered (and the number) to a tech support person (who could then use it to find the point in the logs where the stacktrace started). A number like 56af42g7 is infinitely easier for a user to read and relay than a timestamp like 2016-01-21T15:34:29.933-08:00 or a random UUID like 5f0d3e0c-da96-11e5-b5d2-0a1d41d68578.

我在几年前开发的一个应用程序中解决了这个问题,使用了base 36。我需要生成一个人类可读的唯一数字(无论如何是在当前日历年)。我选择使用当前年1月1日午夜起的毫秒数(因此,时间戳每年都可以复制),并将其转换为36为基数。如果系统被开发遇到了一个致命的问题,生成基本36号(7字符)是通过web界面显示给最终用户谁可以传递遇到的问题(和)一个技术支持的人(他们可以用它来找到点日志加开始的地方)。对于用户来说,56af42g7这样的数字比2016-01-21 t15:34: 29.33 -08:00这样的时间戳或5f0d3e0c-da96-11e5-b5d2-0a1d41d68578这样的随机uid要容易得多。

#1


78  

Base 62 is used by tinyurl and bit.ly for the abbreviated URLs. It's a well-understood method for creating "unique", human-readable IDs. Of course you will have to store the created IDs and check for duplicates on creation to ensure uniqueness. (See code at bottom of answer)

Base 62被tinyurl和bit使用。ly对缩写的url。它是创建“惟一”、人类可读id的一种很容易理解的方法。当然,您必须存储已创建的id,并在创建时检查副本,以确保惟一性。(见下面的代码)

Base 62 uniqueness metrics

基地62独特性度量

5 chars in base 62 will give you 62^5 unique IDs = 916,132,832 (~1 billion) At 10k IDs per day you will be ok for 91k+ days

5字符基地62给你62 ^ 5惟一id = 916132832(~ 10亿)10 k IDs每天你会好91 k +天

6 chars in base 62 will give you 62^6 unique IDs = 56,800,235,584 (56+ billion) At 10k IDs per day you will be ok for 5+ million days

6字符基地62给你62 ^ 6惟一id = 56800235584(56 +十亿)10 k IDs每天你都会好的5 +百万天

Base 36 uniqueness metrics

基地36独特性度量

6 chars will give you 36^6 unique IDs = 2,176,782,336 (2+ billion)

6字符会给你36 ^ 6惟一id = 2176782336(2 +十亿)

7 chars will give you 36^7 unique IDs = 78,364,164,096 (78+ billion)

7字符会给你36 ^ 7惟一id = 78364164096(+ 78)

Code:

代码:

public void TestRandomIdGenerator()
{
    // create five IDs of six, base 62 characters
    for (int i=0; i<5; i++) Console.WriteLine(RandomIdGenerator.GetBase62(6));

    // create five IDs of eight base 36 characters
    for (int i=0; i<5; i++) Console.WriteLine(RandomIdGenerator.GetBase36(8));
}

public static class RandomIdGenerator 
{
    private static char[] _base62chars = 
        "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
        .ToCharArray();

    private static Random _random = new Random();

    public static string GetBase62(int length) 
    {
        var sb = new StringBuilder(length);

        for (int i=0; i<length; i++) 
            sb.Append(_base62chars[_random.Next(62)]);

        return sb.ToString();
    }       

    public static string GetBase36(int length) 
    {
        var sb = new StringBuilder(length);

        for (int i=0; i<length; i++) 
            sb.Append(_base62chars[_random.Next(36)]);

        return sb.ToString();
    }
}

Output:

输出:

z5KyMg
wd4SUp
uSzQtH
UPrGAT
UIf2IS

QCF9GNM5
0UV3TFSS
3MG91VKP
7NTRF10T
AJK3AJU7

#2


13  

I recommend http://hashids.org/ which converts any number (e.g. DB ID) into a string (using salt).

我推荐http://hashids.org/,它可以将任何数字(例如DB ID)转换成字符串(使用salt)。

It allows decoding this string back to the number. So you don't need to store it in the database.

它允许将这个字符串解码回数字。所以你不需要将它存储在数据库中。

Has libs for JavaScript, Ruby, Python, Java, Scala, PHP, Perl, Swift, Clojure, Objective-C, C, C++11, Go, Erlang, Lua, Elixir, ColdFusion, Groovy, Kotlin, Nim, VBA, CoffeeScript and for Node.js & .NET.

有用于JavaScript, Ruby, Python, Java, Scala, PHP, Perl, Swift, Clojure, Objective-C, C, c++ 11, Go, Erlang, Lua, Elixir, ColdFusion, Groovy, Kotlin, Nim, VBA, CoffeeScript和Node。js和。net。

#3


5  

I had similar requirements as the OP. I looked into available libraries but most of them are based on randomness and I didn't want that. I could not really find anything that was not based on random and still very short... So I ended up rolling my own based on the technique Flickr uses, but modified to require less coordination and allow for longer periods offline.

我和opp有相似的要求。我查看了可用的库,但是大多数库都是基于随机性的,我不希望这样。我真的找不到任何不是基于随机还是非常短的东西……因此,我最终基于Flickr使用的技术推出了自己的版本,但修改后需要更少的协调,并允许更长的离线时间。

In short:

简而言之:

  • A central server issues ID blocks consisting of 32 IDs each
  • *服务器发出由32个ID组成的ID块
  • The local ID generator maintains a pool of ID blocks to generate an ID every time one is requested. When the pool runs low it fetches more ID blocks from the server to fill it up again.
  • 本地ID生成器维护一个ID块池,以便在每次请求ID时生成一个ID。当池运行低时,它从服务器获取更多的ID块以再次填充它。

Disadvantages:

缺点:

  • Requires central coordination
  • 需要*协调
  • IDs are more or less predictable (less so than regular DB ids but they aren't random)
  • id或多或少是可预测的(比普通的DB id要少,但它们不是随机的)

Advantages

优势

  • Stays within 53 bits (Javascript / PHP max size for integer numbers)
  • 保持在53位以内(对于整数,Javascript / PHP最大大小)
  • very short IDs
  • 很短的id
  • Base 36 encoded so very easy for humans to read, write and pronounce
  • Base 36编码非常容易被人类阅读、书写和发音
  • IDs can be generated locally for a very long time before needing contact with the server again (depending on pool settings)
  • 在需要再次与服务器联系之前(取决于池设置),IDs可以在本地生成很长一段时间
  • Theoretically no chance of collissions
  • 理论上不可能发生碰撞

I have published both a Javascript library for the client side, as well as a Java EE server implementation. Implementing servers in other languages should be easy as well.

我已经为客户端发布了一个Javascript库,以及一个Java EE服务器实现。用其他语言实现服务器也应该很容易。

Here are the projects:

这是项目:

suid - Distributed Service-Unique IDs that are short and sweet

suid—分布式服务—独特的、简短而甜蜜的id

suid-server-java - Suid-server implementation for the Java EE technology stack.

Java EE技术栈的Suid-server - Java - Suid-server实现。

Both libraries are available under a liberal Creative Commons open source license. Hoping this may help someone else looking for short unique IDs.

这两个库都可以在一个*的知识共享开源许可证下使用。希望这能帮助其他人寻找短的唯一id。

#4


2  

I used base 36 when I solved this problem for an application I was developing a couple of years back. I needed to generate a human readable reasonably unique number (within the current calendar year anyway). I chose to use the time in milliseconds from midnight on Jan 1st of the current year (so each year, the timestamps could duplicate) and convert it to a base 36 number. If the system being developed ran into a fatal issue it generated the base 36 number (7 chars) that was displayed to an end user via the web interface who could then relay the issue encountered (and the number) to a tech support person (who could then use it to find the point in the logs where the stacktrace started). A number like 56af42g7 is infinitely easier for a user to read and relay than a timestamp like 2016-01-21T15:34:29.933-08:00 or a random UUID like 5f0d3e0c-da96-11e5-b5d2-0a1d41d68578.

我在几年前开发的一个应用程序中解决了这个问题,使用了base 36。我需要生成一个人类可读的唯一数字(无论如何是在当前日历年)。我选择使用当前年1月1日午夜起的毫秒数(因此,时间戳每年都可以复制),并将其转换为36为基数。如果系统被开发遇到了一个致命的问题,生成基本36号(7字符)是通过web界面显示给最终用户谁可以传递遇到的问题(和)一个技术支持的人(他们可以用它来找到点日志加开始的地方)。对于用户来说,56af42g7这样的数字比2016-01-21 t15:34: 29.33 -08:00这样的时间戳或5f0d3e0c-da96-11e5-b5d2-0a1d41d68578这样的随机uid要容易得多。