如何在node.js中生成作为ID的SHA1散列?

时间:2022-11-25 13:59:34

I am using this line to generate a sha1 id for node.js:

我用这条线为node.js生成一个sha1 id。

crypto.createHash('sha1').digest('hex');

The problem is that it's returning the same id every time.

问题是它每次都返回相同的id。

Is it possible to have it generate a random id each time so I can use it as a database document id?

是否可以让它每次生成一个随机的id,这样我就可以将它用作数据库文档id?

3 个解决方案

#1


44  

Have a look here: How do I use node.js Crypto to create a HMAC-SHA1 hash? I'd create a hash of the current timestamp + a random number to ensure hash uniqueness:

看看这里:我如何使用node。js加密创建HMAC-SHA1散列?我将创建一个当前时间戳的散列+一个随机数,以确保哈希唯一性:

var current_date = (new Date()).valueOf().toString();
var random = Math.random().toString();
crypto.createHash('sha1').update(current_date + random).digest('hex');

#2


504  

243,583,606,221,817,150,598,111,409x more entropy

I'd recommend using crypto.randomBytes. It's not sha1, but for id purposes, it's quicker, and just as "random".

我建议使用crypto.randomBytes。它不是sha1,但出于id目的,它更快,就像“随机”一样。

var id = crypto.randomBytes(20).toString('hex');
//=> f26d60305dae929ef8640a75e70dd78ab809cfe9

The resulting string will be twice as long as the random bytes you generate; each byte encoded to hex is 2 characters. 20 bytes will be 40 characters of hex.

产生的字符串将是您生成的随机字节的两倍长;编码为十六进制的每个字节是两个字符。20字节将是40个字符的十六进制。

Using 20 bytes, we have 256^20 or 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 unique output values. This is identical to SHA1's 160-bit (20-byte) possible outputs.

使用20个字节,我们有256 ^ 20或1461501637330902918203684832716283019655932542976独特的输出值。这与SHA1的160位(20字节)可能输出完全相同。

Knowing this, it's not really meaningful for us to shasum our random bytes. It's like rolling a die twice but only accepting the second roll; no matter what, you have 6 possible outcomes each roll, so the first roll is sufficient.

知道了这一点,对我们来说,把随机字节设为shasum并没有什么意义。就像掷骰子两次,但只接受第二次;不管怎样,每一卷都有6种可能的结果,所以第一卷就足够了。


Why is this better?

为什么这样更好?

To understand why this is better, we first have to understand how hashing functions work. Hashing functions (including SHA1) will always generate the same output if the same input is given.

为了理解为什么这更好,我们首先要了解散列函数是如何工作的。如果给定相同的输入,哈希函数(包括SHA1)将始终生成相同的输出。

Say we want to generate IDs but our random input is generated by a coin toss. We have "heads" or "tails"

假设我们想生成id但我们的随机输入是通过抛硬币产生的。我们有"正面"或"反面"

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

% echo -n "tails" | shasum
71ac9eed6a76a285ae035fe84a251d56ae9485a4  -

If "heads" comes up again, the SHA1 output will be the same as it was the first time

如果“heads”再次出现,SHA1输出将与第一次相同。

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

Ok, so a coin toss is not a great random ID generator because we only have 2 possible outputs.

抛硬币不是一个很好的随机ID发生器因为我们只有两个可能的输出。

If we use a standard 6-sided die, we have 6 possible inputs. Guess how many possible SHA1 outputs? 6!

如果我们使用一个标准的六面模,我们有6个可能的输入。猜猜有多少SHA1可能输出?6 !

input => (sha1) => output
1 => 356a192b7913b04c54574d18c28d46e6395428ab
2 => da4b9237bacccdf19c0760cab7aec4a8359010b0
3 => 77de68daecd823babbb58edb1c8e14d7106e83bb
4 => 1b6453892473a467d07372d45eb05abc2031647a
5 => ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
6 => c1dfd96eea8cc2b62785275bca38ac261256e278

It's easy to delude ourselves by thinking just because the output of our function looks very random, that it is very random.

We both agree that a coin toss or a 6-sided die would make a bad random id generator, because our possible SHA1 results (the value we use for the ID) are very few. But what if we use something that has a lot more outputs? Like a timestamp with milliseconds? Or JavaScript's Math.random? Or even a combination of those two?!

我们都同意抛硬币或六面骰子会成为一个糟糕的随机id生成器,因为我们可能得到的SHA1结果(我们对id使用的值)非常少。但是如果我们用的东西有更多的输出呢?比如一个毫秒的时间戳?或JavaScript的math . random吗?或者是两者的结合?!

Let's compute just how many unique ids we would get ...

我们来计算一下我们会得到多少唯一的id。


The uniqueness of a timestamp with milliseconds

以毫秒为单位的时间戳的唯一性

When using (new Date()).valueOf().toString(), you're getting a 13-character number (e.g., 1375369309741). However, since this a sequentially updating number (once per millisecond), the outputs are almost always the same. Let's take a look

当使用(new Date() . valueof (). tostring()时,您将得到一个13个字符的数字(例如,1375369309741)。然而,由于这是一个顺序更新的数字(每毫秒一次),输出几乎总是相同的。让我们看一看

for (var i=0; i<10; i++) {
  console.log((new Date()).valueOf().toString());
}
console.log("OMG so not random");

// 1375369431838
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431840
// 1375369431840
// OMG so not random

To be fair, for comparison purposes, in a given minute (a generous operation execution time), you will have 60*1000 or 60000 uniques.

为了公平起见,为了进行比较,在给定的一分钟内(一个宽裕的操作执行时间),您将拥有60*1000或60000个unique。


The uniqueness of Math.random

math . random的独特性

Now, when using Math.random, because of the way JavaScript represents 64-bit floating point numbers, you'll get a number with length anywhere between 13 and 24 characters long. A longer result means more digits which means more entropy. First, we need to find out which is the most probable length.

现在,当使用数学。随机的,由于JavaScript表示64位浮点数的方式,您将得到一个长度在13到24个字符之间的数字。一个更长的结果意味着更多的数字意味着更多的熵。首先,我们需要找出最可能的长度。

The script below will determine which length is most probable. We do this by generating 1 million random numbers and incrementing a counter based on the .length of each number.

下面的脚本将确定最可能的长度。我们通过生成100万个随机数并根据每个数字的.length递增计数器。

// get distribution
var counts = [], rand, len;
for (var i=0; i<1000000; i++) {
  rand = Math.random();
  len  = String(rand).length;
  if (counts[len] === undefined) counts[len] = 0;
  counts[len] += 1;
}

// calculate % frequency
var freq = counts.map(function(n) { return n/1000000 *100 });

By dividing each counter by 1 million, we get the probability of the length of number returned from Math.random.

通过将每个计数器除以100万,我们得到从Math.random返回的数字长度的概率。

len   frequency(%)
------------------
13    0.0004  
14    0.0066  
15    0.0654  
16    0.6768  
17    6.6703  
18    61.133  <- highest probability
19    28.089  <- second highest probability
20    3.0287  
21    0.2989  
22    0.0262
23    0.0040
24    0.0004

So, even though it's not entirely true, let's be generous and say you get a 19-character-long random output; 0.1234567890123456789. The first characters will always be 0 and ., so really we're only getting 17 random characters. This leaves us with 10^17 +1 (for possible 0; see notes below) or 100,000,000,000,000,001 uniques.

所以,即使这不是完全正确的,让我们慷慨地假设你得到一个19字符长的随机输出;0.1234567890123456789。第一个字符总是0和,所以我们只得到17个随机字符。这让我们10 ^ 17 + 1(可能0;(见下面的注释)或100,000,000,000,001单位。


So how many random inputs can we generate?

那么我们能产生多少随机输入呢?

Ok, we calculated the number of results for a millisecond timestamp and Math.random

我们计算了一毫秒时间戳和Math.random的结果数

      100,000,000,000,000,001 (Math.random)
*                      60,000 (timestamp)
-----------------------------
6,000,000,000,000,000,060,000

That's a single 6,000,000,000,000,000,060,000-sided die. Or, to make this number more humanly digestible, this is roughly the same number as

这是一个6000万万人的死亡。或者,为了让这个数字更容易被人理解,这个数字和

input                                            outputs
------------------------------------------------------------------------------
( 1×) 6,000,000,000,000,000,060,000-sided die    6,000,000,000,000,000,060,000
(28×) 6-sided die                                6,140,942,214,464,815,497,21
(72×) 2-sided coins                              4,722,366,482,869,645,213,696

Sounds pretty good, right ? Well, let's find out ...

听起来不错,对吧?好吧,让我们看看……

SHA1 produces a 20-byte value, with a possible 256^20 outcomes. So we're really not using SHA1 to it's full potential. Well how much are we using?

SHA1产生20-byte值,有可能256 ^ 20的结果。所以我们并没有使用SHA1来充分发挥它的潜力。我们用了多少呢?

node> 6000000000000000060000 / Math.pow(256,20) * 100

A millisecond timestamp and Math.random uses only 4.11e-27 percent of SHA1's 160-bit potential!

generator               sha1 potential used
-----------------------------------------------------------------------------
crypto.randomBytes(20)  100%
Date() + Math.random()    0.00000000000000000000000000411%
6-sided die               0.000000000000000000000000000000000000000000000411%
A coin                    0.000000000000000000000000000000000000000000000137%

Holy cats, man! Look at all those zeroes. So how much better is crypto.randomBytes(20)? 243,583,606,221,817,150,598,111,409 times better.

神圣的猫,男人!看看这些0。那么,password . randombytes(20)有多好呢?好243583606221817150598111409倍。


Notes about the +1 and frequency of zeroes

注意+1和0的频率

If you're wondering about the +1, it's possible for Math.random to return a 0 which means there's 1 more possible unique result we have to account for.

如果你想知道+1,可以用数学。随机返回0意味着有1个可能的唯一结果。

Based on the discussion that happened below, I was curious about the frequency a 0 would come up. Here's a little script, random_zero.js, I made to get some data

基于下面的讨论,我很好奇a 0会出现的频率。这是一个小脚本,random_zero。js,我做了一些数据

#!/usr/bin/env node
var count = 0;
while (Math.random() !== 0) count++;
console.log(count);

Then, I ran it in 4 threads (I have a 4-core processor), appending the output to a file

然后,我在4个线程中运行它(我有一个4核处理器),将输出附加到一个文件。

$ yes | xargs -n 1 -P 4 node random_zero.js >> zeroes.txt

So it turns out that a 0 is not that hard to get. After 100 values were recorded, the average was

所以结果是0并不难求。记录100个值后,平均值为

1 in 3,164,854,823 randoms is a 0

在3,164,854,823中,1是0

Cool! More research would be required to know if that number is on-par with a uniform distribution of v8's Math.random implementation

太酷了!需要做更多的研究来确定这个数字是否与v8的数学分布一致。随机实现

#3


21  

Do it in the browser, too !

EDIT: this didn't really fit into the flow of my previous answer. I'm leaving it here as a second answer for people that might be looking to do this in the browser.

编辑:这和我之前的回答不太相符。我把它留在这里,作为对那些想在浏览器中这样做的人的第二种回答。

You can do this client side in modern browsers, if you'd like

如果您愿意,您可以在现代浏览器中使用这个客户端。

// str byteToHex(uint8 byte)
//   converts a single byte to a hex string 
function byteToHex(byte) {
  return ('0' + byte.toString(16)).slice(-2);
}

// str generateId(int len);
//   len - must be an even number (default: 40)
function generateId(len) {
  var arr = new Uint8Array((len || 40) / 2);
  window.crypto.getRandomValues(arr);
  return [].map.call(arr, byteToHex).join("");
}

Ok, let's check it out !

好吧,我们去看看!

generateId();
// "1e6ef8d5c851a3b5c5ad78f96dd086e4a77da800"

generateId(20);
// "d2180620d8f781178840"

Browser requirements

浏览器的需求

Browser    Minimum Version
--------------------------
Chrome     11.0
Firefox    21.0
IE         11.0
Opera      15.0
Safari     5.1

#1


44  

Have a look here: How do I use node.js Crypto to create a HMAC-SHA1 hash? I'd create a hash of the current timestamp + a random number to ensure hash uniqueness:

看看这里:我如何使用node。js加密创建HMAC-SHA1散列?我将创建一个当前时间戳的散列+一个随机数,以确保哈希唯一性:

var current_date = (new Date()).valueOf().toString();
var random = Math.random().toString();
crypto.createHash('sha1').update(current_date + random).digest('hex');

#2


504  

243,583,606,221,817,150,598,111,409x more entropy

I'd recommend using crypto.randomBytes. It's not sha1, but for id purposes, it's quicker, and just as "random".

我建议使用crypto.randomBytes。它不是sha1,但出于id目的,它更快,就像“随机”一样。

var id = crypto.randomBytes(20).toString('hex');
//=> f26d60305dae929ef8640a75e70dd78ab809cfe9

The resulting string will be twice as long as the random bytes you generate; each byte encoded to hex is 2 characters. 20 bytes will be 40 characters of hex.

产生的字符串将是您生成的随机字节的两倍长;编码为十六进制的每个字节是两个字符。20字节将是40个字符的十六进制。

Using 20 bytes, we have 256^20 or 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 unique output values. This is identical to SHA1's 160-bit (20-byte) possible outputs.

使用20个字节,我们有256 ^ 20或1461501637330902918203684832716283019655932542976独特的输出值。这与SHA1的160位(20字节)可能输出完全相同。

Knowing this, it's not really meaningful for us to shasum our random bytes. It's like rolling a die twice but only accepting the second roll; no matter what, you have 6 possible outcomes each roll, so the first roll is sufficient.

知道了这一点,对我们来说,把随机字节设为shasum并没有什么意义。就像掷骰子两次,但只接受第二次;不管怎样,每一卷都有6种可能的结果,所以第一卷就足够了。


Why is this better?

为什么这样更好?

To understand why this is better, we first have to understand how hashing functions work. Hashing functions (including SHA1) will always generate the same output if the same input is given.

为了理解为什么这更好,我们首先要了解散列函数是如何工作的。如果给定相同的输入,哈希函数(包括SHA1)将始终生成相同的输出。

Say we want to generate IDs but our random input is generated by a coin toss. We have "heads" or "tails"

假设我们想生成id但我们的随机输入是通过抛硬币产生的。我们有"正面"或"反面"

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

% echo -n "tails" | shasum
71ac9eed6a76a285ae035fe84a251d56ae9485a4  -

If "heads" comes up again, the SHA1 output will be the same as it was the first time

如果“heads”再次出现,SHA1输出将与第一次相同。

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

Ok, so a coin toss is not a great random ID generator because we only have 2 possible outputs.

抛硬币不是一个很好的随机ID发生器因为我们只有两个可能的输出。

If we use a standard 6-sided die, we have 6 possible inputs. Guess how many possible SHA1 outputs? 6!

如果我们使用一个标准的六面模,我们有6个可能的输入。猜猜有多少SHA1可能输出?6 !

input => (sha1) => output
1 => 356a192b7913b04c54574d18c28d46e6395428ab
2 => da4b9237bacccdf19c0760cab7aec4a8359010b0
3 => 77de68daecd823babbb58edb1c8e14d7106e83bb
4 => 1b6453892473a467d07372d45eb05abc2031647a
5 => ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
6 => c1dfd96eea8cc2b62785275bca38ac261256e278

It's easy to delude ourselves by thinking just because the output of our function looks very random, that it is very random.

We both agree that a coin toss or a 6-sided die would make a bad random id generator, because our possible SHA1 results (the value we use for the ID) are very few. But what if we use something that has a lot more outputs? Like a timestamp with milliseconds? Or JavaScript's Math.random? Or even a combination of those two?!

我们都同意抛硬币或六面骰子会成为一个糟糕的随机id生成器,因为我们可能得到的SHA1结果(我们对id使用的值)非常少。但是如果我们用的东西有更多的输出呢?比如一个毫秒的时间戳?或JavaScript的math . random吗?或者是两者的结合?!

Let's compute just how many unique ids we would get ...

我们来计算一下我们会得到多少唯一的id。


The uniqueness of a timestamp with milliseconds

以毫秒为单位的时间戳的唯一性

When using (new Date()).valueOf().toString(), you're getting a 13-character number (e.g., 1375369309741). However, since this a sequentially updating number (once per millisecond), the outputs are almost always the same. Let's take a look

当使用(new Date() . valueof (). tostring()时,您将得到一个13个字符的数字(例如,1375369309741)。然而,由于这是一个顺序更新的数字(每毫秒一次),输出几乎总是相同的。让我们看一看

for (var i=0; i<10; i++) {
  console.log((new Date()).valueOf().toString());
}
console.log("OMG so not random");

// 1375369431838
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431840
// 1375369431840
// OMG so not random

To be fair, for comparison purposes, in a given minute (a generous operation execution time), you will have 60*1000 or 60000 uniques.

为了公平起见,为了进行比较,在给定的一分钟内(一个宽裕的操作执行时间),您将拥有60*1000或60000个unique。


The uniqueness of Math.random

math . random的独特性

Now, when using Math.random, because of the way JavaScript represents 64-bit floating point numbers, you'll get a number with length anywhere between 13 and 24 characters long. A longer result means more digits which means more entropy. First, we need to find out which is the most probable length.

现在,当使用数学。随机的,由于JavaScript表示64位浮点数的方式,您将得到一个长度在13到24个字符之间的数字。一个更长的结果意味着更多的数字意味着更多的熵。首先,我们需要找出最可能的长度。

The script below will determine which length is most probable. We do this by generating 1 million random numbers and incrementing a counter based on the .length of each number.

下面的脚本将确定最可能的长度。我们通过生成100万个随机数并根据每个数字的.length递增计数器。

// get distribution
var counts = [], rand, len;
for (var i=0; i<1000000; i++) {
  rand = Math.random();
  len  = String(rand).length;
  if (counts[len] === undefined) counts[len] = 0;
  counts[len] += 1;
}

// calculate % frequency
var freq = counts.map(function(n) { return n/1000000 *100 });

By dividing each counter by 1 million, we get the probability of the length of number returned from Math.random.

通过将每个计数器除以100万,我们得到从Math.random返回的数字长度的概率。

len   frequency(%)
------------------
13    0.0004  
14    0.0066  
15    0.0654  
16    0.6768  
17    6.6703  
18    61.133  <- highest probability
19    28.089  <- second highest probability
20    3.0287  
21    0.2989  
22    0.0262
23    0.0040
24    0.0004

So, even though it's not entirely true, let's be generous and say you get a 19-character-long random output; 0.1234567890123456789. The first characters will always be 0 and ., so really we're only getting 17 random characters. This leaves us with 10^17 +1 (for possible 0; see notes below) or 100,000,000,000,000,001 uniques.

所以,即使这不是完全正确的,让我们慷慨地假设你得到一个19字符长的随机输出;0.1234567890123456789。第一个字符总是0和,所以我们只得到17个随机字符。这让我们10 ^ 17 + 1(可能0;(见下面的注释)或100,000,000,000,001单位。


So how many random inputs can we generate?

那么我们能产生多少随机输入呢?

Ok, we calculated the number of results for a millisecond timestamp and Math.random

我们计算了一毫秒时间戳和Math.random的结果数

      100,000,000,000,000,001 (Math.random)
*                      60,000 (timestamp)
-----------------------------
6,000,000,000,000,000,060,000

That's a single 6,000,000,000,000,000,060,000-sided die. Or, to make this number more humanly digestible, this is roughly the same number as

这是一个6000万万人的死亡。或者,为了让这个数字更容易被人理解,这个数字和

input                                            outputs
------------------------------------------------------------------------------
( 1×) 6,000,000,000,000,000,060,000-sided die    6,000,000,000,000,000,060,000
(28×) 6-sided die                                6,140,942,214,464,815,497,21
(72×) 2-sided coins                              4,722,366,482,869,645,213,696

Sounds pretty good, right ? Well, let's find out ...

听起来不错,对吧?好吧,让我们看看……

SHA1 produces a 20-byte value, with a possible 256^20 outcomes. So we're really not using SHA1 to it's full potential. Well how much are we using?

SHA1产生20-byte值,有可能256 ^ 20的结果。所以我们并没有使用SHA1来充分发挥它的潜力。我们用了多少呢?

node> 6000000000000000060000 / Math.pow(256,20) * 100

A millisecond timestamp and Math.random uses only 4.11e-27 percent of SHA1's 160-bit potential!

generator               sha1 potential used
-----------------------------------------------------------------------------
crypto.randomBytes(20)  100%
Date() + Math.random()    0.00000000000000000000000000411%
6-sided die               0.000000000000000000000000000000000000000000000411%
A coin                    0.000000000000000000000000000000000000000000000137%

Holy cats, man! Look at all those zeroes. So how much better is crypto.randomBytes(20)? 243,583,606,221,817,150,598,111,409 times better.

神圣的猫,男人!看看这些0。那么,password . randombytes(20)有多好呢?好243583606221817150598111409倍。


Notes about the +1 and frequency of zeroes

注意+1和0的频率

If you're wondering about the +1, it's possible for Math.random to return a 0 which means there's 1 more possible unique result we have to account for.

如果你想知道+1,可以用数学。随机返回0意味着有1个可能的唯一结果。

Based on the discussion that happened below, I was curious about the frequency a 0 would come up. Here's a little script, random_zero.js, I made to get some data

基于下面的讨论,我很好奇a 0会出现的频率。这是一个小脚本,random_zero。js,我做了一些数据

#!/usr/bin/env node
var count = 0;
while (Math.random() !== 0) count++;
console.log(count);

Then, I ran it in 4 threads (I have a 4-core processor), appending the output to a file

然后,我在4个线程中运行它(我有一个4核处理器),将输出附加到一个文件。

$ yes | xargs -n 1 -P 4 node random_zero.js >> zeroes.txt

So it turns out that a 0 is not that hard to get. After 100 values were recorded, the average was

所以结果是0并不难求。记录100个值后,平均值为

1 in 3,164,854,823 randoms is a 0

在3,164,854,823中,1是0

Cool! More research would be required to know if that number is on-par with a uniform distribution of v8's Math.random implementation

太酷了!需要做更多的研究来确定这个数字是否与v8的数学分布一致。随机实现

#3


21  

Do it in the browser, too !

EDIT: this didn't really fit into the flow of my previous answer. I'm leaving it here as a second answer for people that might be looking to do this in the browser.

编辑:这和我之前的回答不太相符。我把它留在这里,作为对那些想在浏览器中这样做的人的第二种回答。

You can do this client side in modern browsers, if you'd like

如果您愿意,您可以在现代浏览器中使用这个客户端。

// str byteToHex(uint8 byte)
//   converts a single byte to a hex string 
function byteToHex(byte) {
  return ('0' + byte.toString(16)).slice(-2);
}

// str generateId(int len);
//   len - must be an even number (default: 40)
function generateId(len) {
  var arr = new Uint8Array((len || 40) / 2);
  window.crypto.getRandomValues(arr);
  return [].map.call(arr, byteToHex).join("");
}

Ok, let's check it out !

好吧,我们去看看!

generateId();
// "1e6ef8d5c851a3b5c5ad78f96dd086e4a77da800"

generateId(20);
// "d2180620d8f781178840"

Browser requirements

浏览器的需求

Browser    Minimum Version
--------------------------
Chrome     11.0
Firefox    21.0
IE         11.0
Opera      15.0
Safari     5.1