密码安全的随机数生成器如何工作?

I understand how standard random number generators work. But when working with crytpography, the random numbers really have to be random.

我理解标准随机数生成器是如何工作的。但是当使用密码术时，随机数必须是随机的。

I know there are instruments that read cosmic white noise to help generate secure hashes, but your standard PC doesn't have this.

我知道有一些仪器可以读取宇宙的白噪声来帮助生成安全的哈希，但是你的标准PC没有这个。

How does a cryptographically secure random number generator get its values with no repeatable patterns?

一个密码安全的随机数生成器如何得到它的值，而没有可重复的模式?

5 个解决方案

#1

A cryptographically secure number random generator, as you might use for generating encryption keys, works by gathering entropy - that is, unpredictable input - from a source which other people can't observe.

一个密码安全的随机数生成器，就像你可能用来生成加密密钥一样，它的工作原理是收集熵——也就是不可预知的输入——从一个其他人无法观察到的来源。

For instance, /dev/random(4) on Linux collects information from the variation in timing of hardware interrupts from sources such as hard disks returning data, keypresses and incoming network packets. This approach is secure provided that the kernel does not overestimate how much entropy it has collected. A few years back the estimations of entropy from the various different sources were all reduced, making them far more conservative. Here's an explanation of how Linux estimates entropy.

例如，Linux上的/dev/random(4)从硬件中断的时间中收集信息，如硬盘返回数据、按键和传入的网络数据包。这种方法是安全的，只要内核不高估它收集的熵。几年前，各种不同来源的熵的估计都减少了，使它们变得更加保守。下面是Linux对熵的估计。

None of the above is particularly high-throughput. /dev/random(4) probably is secure, but it maintains that security by refusing to give out data once it can't be sure that that data is securely random. If you want to, for example, generate a lot of cryptographic keys and nonces then you'll probably want to resort to hardware random number generators.

以上这些都不是特别高的吞吐量。/dev/random(4)可能是安全的，但它通过拒绝提供数据来维护安全性，因为它不能确保数据是安全的随机的。例如，如果您想要生成大量的加密密钥和非ces，那么您可能想要使用硬件随机数生成器。

Often hardware RNGs are designed about sampling from the difference between a pair of oscillators that are running at close to the same speed, but whose rates are varied slightly according to thermal noise. If I remember rightly, the random number generator that's used for the UK's premium bond lottery, ERNIE, works this way.

通常，硬件RNGs的设计是由两个振荡器之间的差别来设计的，它们的速度接近相同的速度，但是根据热噪声的不同，它们的速率会略有不同。如果我没记错的话，用于英国优质债券彩票的随机数生成器，ERNIE，就是这样运作的。

Alternate schemes include sampling the noise on a CCD (see lavaRND), radioactive decay (see hotbits) or atmospheric noise (see random.org, or just plug an AM radio tuned somewhere other than a station into your sound card). Or you can directly ask the computer's user to bang on their keyboard like a deranged chimpanzee for a minute, whatever floats your boat.

备选方案包括在CCD上采样(见lavaRND)，放射性衰变(见hotbits)或大气噪声(见random.org，或者只在你的声卡上安装一个调频收音机，而不是一个电台)。或者你可以直接让电脑的用户像一个疯狂的黑猩猩一样敲击键盘一分钟，无论你的船是什么。

Apologies for the lack of inline links, but apparently I have to burn more of my lifetime on this website before I'm allowed to post more.

很抱歉没有内联链接，但是很明显，我必须在这个网站上花费更多的时间，才允许我发布更多的链接。

EDIT: belay that. Because some people were kind enough to this bag of bones to click that upwards-pointy thingy on the left, I'm apparently allowed to put the rest of the links in now. Thank you kind people! ^_^

编辑:拴牢。因为有些人对这包骨头很友善，所以在左边，我很明显可以把其他的链接放进去。谢谢你善良的人!^ _ ^

EDIT: as andras pointed out, I only thought to talk about some of the most common entropy gathering schemes. Thomas Pornin's answer and Johannes Rössel's answer both do good jobs of explaining how one can go about mangling gathered entropy in order to hand bits of it out again.

编辑:正如andras指出的，我只是想谈谈一些最常见的熵收集方案。Thomas Pornin的回答和Johannes Rossel的回答都很好地解释了为什么人们可以通过控制聚集的熵来将部分的信息再次传递出去。

#2

For cryptographic purposes, what is needed is that the stream shall be "computationally indistinguishable from uniformly random bits". "Computationally" means that it needs not be truly random, only that it appears so to anybody without access to God's own computer.

对于加密的目的，我们需要的是，流应该是“在计算上与一致的随机位不可区分的”。“计算”意味着它不需要是真正随机的，只需要它出现在任何人都无法访问上帝自己的计算机。

In practice, this means that the system must first gather a sequence of n truly random bits. n shall be large enough to thwart exhaustive search, i.e. it shall be infeasible to try all 2^n combinations of n bits. This is achieved, with regards to today's technology, as long as n is greater than 90-or-so, but cryptographers just love powers of two, so it is customary to use n = 128.

实际上，这意味着系统必须首先收集一个n个真正随机位的序列。n应当足以挫败穷举搜索,即应当不可行尝试所有2 ^ n n比特的组合。这是实现的，关于今天的技术，只要n大于90或以上，但是密码学家喜欢2的幂，所以习惯上使用n = 128。

These n random bits are obtained by gathering "physical events" which should be unpredictable, as far as physics are concerned. Usually, timing is used: the CPU has a cycle counter which is updated several billions times per second, and some events occur with an inevitable amount of jitter (incoming network packets, mouse movements, key strokes...). The system encodes these events and then "compresses" them by applying a cryptographically secure hash function such as SHA-256 (output is then truncated to yield our n bits). What matters here is that the encoding of the physical events has enough entropy: roughly speaking, that the said events could have collectively assumed at least 2^n combinations. The hash function, by its definition, should make a good job at concentrating that entropy into a n-bit string.

这些n个随机位是通过收集“物理事件”获得的，这些“物理事件”应该是不可预测的，就物理学而言。通常，使用时间:CPU有一个周期计数器，它每秒更新几十亿次，并且一些事件会发生不可避免的抖动(传入网络数据包、鼠标移动、键击…)。系统对这些事件进行编码，然后通过应用一个加密安全的散列函数(如SHA-256)来“压缩”这些事件(输出被截断以产生我们的n比特)。重要的就是物理事件有足够的熵编码:粗略地说,该事件可以共同承担至少2 ^ n组合。根据定义，哈希函数应该能很好地将熵集中到n位字符串中。

Once we have n bits, we use a PRNG (Pseudo-Random Number Generator) to crank out as many bits as necessary. A PRNG is said to be cryptographically secure if, assuming that it operates over a wide enough unknown n-bit key, its output is computationally indistinguishable from uniformly random bits. In the 90's, a popular choice was RC4, which is very simple to implement, and quite fast. However, it turned out to have measurable biases, i.e. it was not as indistinguishable as was initially wished for. The eSTREAM Project consisted in gathering newer designs for PRNG (actually stream ciphers, because most stream ciphers consist in a PRNG, which output is XORed with the data to encrypt), documenting them, and promoting analysis by cryptographers. The eSTREAM Portfolio contains seven PRNG designs which were deemed secure enough (i.e. they resisted analysis and cryptographers tend to have a good understanding of why they resisted). Among them, four are "optimized for software". The good news is that while these new PRNG seem to be much more secure than RC4, they are also noticeably faster (we are talking about hundreds of megabytes per second, here). Three of them are "free for any use" and source code is provided.

一旦我们有了n位，我们就使用一个PRNG(伪随机数生成器)来尽可能多的输出。如果假设它在一个足够宽的未知的n位键上运行，那么它的输出在计算上无法与一致的随机位相区分。在90年代，流行的选择是RC4，它非常容易实现，而且非常快。然而，事实证明，它存在可测量的偏差，即它并没有最初希望的那样难以区分。eSTREAM项目包括为PRNG收集更新的设计(实际上是流密码，因为大多数流密码都包含在一个PRNG中，它的输出是用数据加密的)，记录它们，并促进密码器的分析。eSTREAM的投资组合包含了7个被认为足够安全的PRNG设计(也就是说，他们抵制了分析，而密码设计者往往很清楚他们为什么会抵制)。其中4个是“软件优化”。好消息是，虽然这些新的PRNG看起来比RC4更安全，但它们也明显更快(我们在这里讨论的是每秒几百兆字节)。其中三个是“免费使用”和源代码。

From a design point of view, PRNG reuse much of the elements of block ciphers. The same concepts of avalanche and diffusion of bits into a wide internal state are used. Alternatively, a decent PRNG can be built from a block cipher: simply use the n-bit sequence as key into a block cipher, and encrypt successive values of a counter (expressed as a m-bit sequence, if the block cipher uses m-bit blocks). This produces a pseudo-random stream of bits which is computationally indistinguishable from random, as long as the block cipher is secure, and the produced stream is no longer than m*2^(m/2) bits (for m = 128, this means about 300 billions of gigabytes, so that's big enough for most purposes). That kind of usage is known as counter mode (CTR).

从设计的角度来看，PRNG重用了许多块密码的元素。同样的概念，雪崩和扩散的位元到一个广泛的内部状态被使用。或者，一个合适的PRNG可以由一个块密码构建:简单地使用n位序列作为密钥，并加密一个计数器的连续值(如果块密码使用m位块，则表示为m位序列)。这生成一个伪随机比特流,计算与随机的,只要分组密码是安全的,和生产流不再比m * 2 ^(m / 2)比特(m = 128,这意味着大约300数十亿字节,这是足够大的在大多数情况下)。这种用法被称为反模式(CTR)。

Usually, a block cipher in CTR mode is not as fast as a dedicated stream cipher (the point of the stream cipher is that, by forfeiting the flexibility of a block cipher, better performance is expected). However, if you happen to have one of the most recent CPU from Intel with the AES-NI instructions (which are basically an AES implementation in hardware, integrated in the CPU), then AES with CTR mode will yield unbeatable speed (several gigabytes per second).

通常，CTR模式下的分组密码不像专用的流密码那样快(流密码的一点是，通过放弃分组密码的灵活性，可以更好地实现性能)。但是，如果您恰好有一个来自Intel的最新CPU, AES- ni指令(这基本上是硬件上的AES实现，集成在CPU上)，那么AES与CTR模式将产生无与伦比的速度(每秒数gb)。

#3

First of all, the point of a cryptographically secure PRNG is not to generate entirely unpredictable sequences. As you noted, the absence of something that generates large volumes of (more or less) true randomness¹ makes that impossible.

首先，加密安全PRNG的重点不是生成完全不可预测的序列。正如您所指出的，缺少能够产生大量(或多或少)真正的随机性的东西，这是不可能的。

So you resort to something which is only hard to predict. “Hard” meaning here that it takes unfeasibly long by which time whatever it was necessary for would be obsolete anyway. There are a number of mathematical algorithms that play a part in this—you can get a glimpse if you take some well-known CSPRNGs and look at how they work.

所以你求助于一些很难预测的事情。“硬”的意思是说，无论什么时候，无论如何，它都是过时的。有许多数学算法在这方面发挥了作用——如果你使用一些著名的CSPRNGs，看看它们是如何工作的，你就可以一瞥。

The most common variants to build such a PRNG are:

构建这样一种PRNG的最常见的变体是:

Using a stream cipher, which already outputs a (supposedly secure) pseudo-random bit stream.
使用流密码，它已经输出了(应该是安全的)伪随机位流。
Using a block cipher in counter mode
在计数器模式下使用分组密码。

Hash functions on a counter are also sometimes used. Wikipedia has more on this.

在计数器上的哈希函数有时也会被使用。*上有更多内容。

General requirements are just that it's unfeasible to determine the original initialization vector from a generator's bit stream and that the next bit cannot be easily predicted.

一般的要求是，从生成器的比特流中确定初始的初始化向量是不可行的，并且下一比特很难预测。

As for initialization, most CSPRNGs use various sources available on the system, ranging from truly random things like line noise, interrupts or other events in the system to other things like certain memory locations, &c. The initialization vector is preferrably really random and not dependent on a mathematical algorithm. This initialization was broken for some time in Debian's implementation of OpenSSL which led to severe security problems.

对于初始化，大多数CSPRNGs使用系统上可用的各种源，从真正的随机事件，如行噪声、中断或系统中的其他事件到其他类似的内存位置，以及c。初始化向量是非常随机的，不依赖于数学算法。这个初始化在Debian的OpenSSL实现中被打破了一段时间，这导致了严重的安全问题。

¹ Which has its problems too and one has to be careful in eliminating bias as things such as thermal noise has different characteristics depending on the temperature—you almost always have bias and need to eliminate it. And that's not a trivial task in itself.

1也有它的问题，在消除偏倚时要小心，因为热噪声有不同的特性，取决于温度——你几乎总是有偏见，需要消除它。这本身并不是一项微不足道的任务。

#4

In order for a random number generator to be considered cryptographically secure, in needs to be secure against attack by an adversary who knows the algorithm and a (large) number of previously generated bits. What this means is that someone with that information can't reconstruct any of the hidden internal state of the generator and give predictions of what the next bits produced will be with better than 50% accuracy.

为了使随机数生成器被认为是加密安全的，在需要安全的情况下，攻击者必须知道该算法和(大量)先前生成的比特的数量。这就意味着，有信息的人无法重建发电机的任何隐藏的内部状态，并预测出接下来的数据将会有超过50%的准确率。

Normal pseudo-random number generators are generally not cryptographically secure, as reconstructing the internal state from previously output bits is generaly trivial (often, the entire internal state is just the last N bits produced directly). Any random number generator without good statistical properties is also not cryptographically secure, as its output is at least party predictable even without knowing the internal state.

一般的伪随机数生成器通常不是加密安全的，因为从以前的输出位重构内部状态通常是微不足道的(通常，整个内部状态就是直接生成的最后一个N位)。任何没有良好统计特性的随机数生成器也不是加密安全的，因为它的输出至少是可预测的，即使不知道内部状态。

So, as to how they work, any good crypto system can be used as a cryptographically secure random number generator -- use the crypto system to encrypt the output of a 'normal' random number generator. Since an adversary can't reconstruct the plaintext output of the normal random number generator, he can't attack it directly. This is a somewhat circular definition an begs the question of how you key the crypto system to keep it secure, which is a whole other problem.

因此，对于它们的工作方式，任何好的加密系统都可以作为密码安全的随机数生成器——使用加密系统对“正常”随机数生成器的输出进行加密。由于对手不能重建正常随机数发生器的明文输出，他不能直接攻击它。这是一个有点循环的定义，回避了一个问题，你如何关键的加密系统来保证它的安全，这是另一个问题。

#5

Each generator will use its own seeding strategy, but here's a bit from the Windows API documentation on CryptGenRandom

每个生成器将使用它自己的播种策略，但这里有一些来自CryptGenRandom的Windows API文档。

With Microsoft CSPs, CryptGenRandom uses the same random number generator used by other security components. This allows numerous processes to contribute to a system-wide seed. CryptoAPI stores an intermediate random seed with every user. To form the seed for the random number generator, a calling application supplies bits it might have—for instance, mouse or keyboard timing input—that are then combined with both the stored seed and various system data and user data such as the process ID and thread ID, the system clock, the system time, the system counter, memory status, free disk clusters, the hashed user environment block. This result is used to seed the pseudorandom number generator (PRNG).

对于Microsoft csp, CryptGenRandom使用了与其他安全组件相同的随机数生成器。这允许许多进程对系统范围的种子做出贡献。CryptoAPI为每个用户存储一个中间随机种子。形成随机数生成器的种子,调用应用程序提供一些可能对实例,时机鼠标或键盘的输入,然后结合存储种子和各种系统数据和用户数据,比如进程ID和线程ID,系统时钟,系统时间,系统计数器,内存状态,空闲磁盘集群、散列的用户环境。此结果用于种子伪随机数生成器(PRNG)。

In Windows Vista with Service Pack 1 (SP1) and later, an implementation of the AES counter-mode based PRNG specified in NIST Special Publication 800-90 is used. In Windows Vista, Windows Storage Server 2003, and Windows XP, the PRNG specified in Federal Information Processing Standard (FIPS) 186-2 is used. If an application has access to a good random source, it can fill the pbBuffer buffer with some random data before calling CryptGenRandom. The CSP then uses this data to further randomize its internal seed. It is acceptable to omit the step of initializing the pbBuffer buffer before calling CryptGenRandom.

在使用Service Pack 1 (SP1)和之后的Windows Vista中，使用了NIST特殊出版物800-90中指定的AES反模式的实现。在Windows Vista、Windows存储服务器2003和Windows XP中，使用了联邦信息处理标准(FIPS) 186-2中的PRNG。如果一个应用程序能够访问一个好的随机源，它可以在调用CryptGenRandom之前，用一些随机数据填充pbBuffer缓冲区。CSP然后利用这些数据进一步随机化其内部种子。在调用CryptGenRandom之前，忽略初始化pbBuffer缓冲区的步骤是可以接受的。

#1