divmod()比使用%和//运算符更快吗?

时间:2021-01-24 07:20:06

I remember from assembly that integer division instructions yield both the quotient and remainder. So, in python will the built-in divmod() function be better performance-wise than using the % and // operators (suppose of course one needs both the quotient and the remainder)?

我从汇编中记得整数除法指令同时产生商和余数。因此,在python中,内置的divmod()函数比使用%和//运算符更好地表现性能(假设当然需要商和余数)?

q, r = divmod(n, d)
q, r = (n // d, n % d)

2 个解决方案

#1


37  

To measure is to know (all timings on a Macbook Pro 2.8Ghz i7):

测量是要知道(Macbook Pro 2.8Ghz i7上的所有时间):

>>> import sys, timeit
>>> sys.version_info
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
>>> timeit.timeit('divmod(n, d)', 'n, d = 42, 7')
0.1473848819732666
>>> timeit.timeit('n // d, n % d', 'n, d = 42, 7')
0.10324406623840332

The divmod() function is at a disadvantage here because you need to look up the global each time. Binding it to a local (all variables in a timeit time trial are local) improves performance a little:

divmod()函数在这里处于劣势,因为您需要每次都查找全局。将它绑定到本地(timeit time trial中的所有变量都是本地的)可以稍微提高性能:

>>> timeit.timeit('dm(n, d)', 'n, d = 42, 7; dm = divmod')
0.13460898399353027

but the operators still win because they don't have to preserve the current frame while a function call to divmod() is executed:

但是运算符仍然会赢,因为在执行对divmod()的函数调用时,它们不必保留当前帧:

>>> import dis
>>> dis.dis(compile('divmod(n, d)', '', 'exec'))
  1           0 LOAD_NAME                0 (divmod)
              3 LOAD_NAME                1 (n)
              6 LOAD_NAME                2 (d)
              9 CALL_FUNCTION            2
             12 POP_TOP             
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE        
>>> dis.dis(compile('(n // d, n % d)', '', 'exec'))
  1           0 LOAD_NAME                0 (n)
              3 LOAD_NAME                1 (d)
              6 BINARY_FLOOR_DIVIDE 
              7 LOAD_NAME                0 (n)
             10 LOAD_NAME                1 (d)
             13 BINARY_MODULO       
             14 BUILD_TUPLE              2
             17 POP_TOP             
             18 LOAD_CONST               0 (None)
             21 RETURN_VALUE        

The // and % variant uses more opcodes, but the CALL_FUNCTION bytecode is a bear, performance wise.

//和%变量使用更多操作码,但CALL_FUNCTION字节码是一个熊,性能明智。


In PyPy, for small integers there isn't really much of a difference; the small speed advantage the opcodes have melts away under the sheer speed of C integer arithmetic:

在PyPy中,对于小整数,没有太大的区别;在C整数运算的绝对速度下,操作码融化的小速度优势:

>>>> import platform, sys, timeit
>>>> platform.python_implementation(), sys.version_info
('PyPy', (major=2, minor=7, micro=10, releaselevel='final', serial=42))
>>>> timeit.timeit('divmod(n, d)', 'n, d = 42, 7', number=10**9)
0.5659301280975342
>>>> timeit.timeit('n // d, n % d', 'n, d = 42, 7', number=10**9)
0.5471200942993164

(I had to crank the number of repetitions up to 1 billion to show how small the difference really is, PyPy is blazingly fast here).

(我不得不将重复次数增加到10亿,以显示差异的实际差异,PyPy在这里非常快)。

However, when the numbers get large, divmod() wins by a country mile:

但是,当数字变大时,divmod()会赢一个国家英里:

>>>> timeit.timeit('divmod(n, d)', 'n, d = 2**74207281 - 1, 26', number=100)
17.620037078857422
>>>> timeit.timeit('n // d, n % d', 'n, d = 2**74207281 - 1, 26', number=100)
34.44323515892029

(I now had to tune down the number of repetitions by a factor of 10 compared to hobbs' numbers, just to get a result in a reasonable amount of time).

(与hobbs的数字相比,我现在不得不将重复次数调低10倍,只是为了在合理的时间内得到结果)。

This is because PyPy no longer can unbox those integers as C integers; you can see the striking difference in timings between using sys.maxint and sys.maxint + 1:

这是因为PyPy不再可以将这些整数解包为C整数;您可以看到使用sys.maxint和sys.maxint + 1之间的时间差异显着:

>>>> timeit.timeit('divmod(n, d)', 'import sys; n, d = sys.maxint, 26', number=10**7)
0.008622884750366211
>>>> timeit.timeit('n // d, n % d', 'import sys; n, d = sys.maxint, 26', number=10**7)
0.007693052291870117
>>>> timeit.timeit('divmod(n, d)', 'import sys; n, d = sys.maxint + 1, 26', number=10**7)
0.8396248817443848
>>>> timeit.timeit('n // d, n % d', 'import sys; n, d = sys.maxint + 1, 26', number=10**7)
1.0117690563201904

#2


14  

Martijn's answer is correct if you're using "small" native integers, where arithmetic operations are very fast compared to function calls. However, with bigints, it's a whole different story:

如果您使用“小”本机整数,Martijn的答案是正确的,其中算术运算与函数调用相比非常快。然而,对于bigints,这是一个完全不同的故事:

>>> import timeit
>>> timeit.timeit('divmod(n, d)', 'n, d = 2**74207281 - 1, 26', number=1000)
24.22666597366333
>>> timeit.timeit('n // d, n % d', 'n, d = 2**74207281 - 1, 26', number=1000)
49.517399072647095

when dividing a 22-million-digit number, divmod is almost exactly twice as fast as doing the division and modulus separately, as you might expect.

当划分一个2200万位的数字时,divmod的速度几乎是分割和模数的两倍,正如您所预期的那样。

On my machine, the crossover occurs somewhere around 2^63, but don't take my word for it. As Martijn says, measure! When performance really matters, don't assume that what held true in one place will still be true in another.

在我的机器上,交叉发生在2 ^ 63左右,但不要接受我的话。正如Martijn所说,衡量!当表现真的很重要时,不要认为在一个地方保持真实的东西在另一个地方仍然是真的。

#1


37  

To measure is to know (all timings on a Macbook Pro 2.8Ghz i7):

测量是要知道(Macbook Pro 2.8Ghz i7上的所有时间):

>>> import sys, timeit
>>> sys.version_info
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
>>> timeit.timeit('divmod(n, d)', 'n, d = 42, 7')
0.1473848819732666
>>> timeit.timeit('n // d, n % d', 'n, d = 42, 7')
0.10324406623840332

The divmod() function is at a disadvantage here because you need to look up the global each time. Binding it to a local (all variables in a timeit time trial are local) improves performance a little:

divmod()函数在这里处于劣势,因为您需要每次都查找全局。将它绑定到本地(timeit time trial中的所有变量都是本地的)可以稍微提高性能:

>>> timeit.timeit('dm(n, d)', 'n, d = 42, 7; dm = divmod')
0.13460898399353027

but the operators still win because they don't have to preserve the current frame while a function call to divmod() is executed:

但是运算符仍然会赢,因为在执行对divmod()的函数调用时,它们不必保留当前帧:

>>> import dis
>>> dis.dis(compile('divmod(n, d)', '', 'exec'))
  1           0 LOAD_NAME                0 (divmod)
              3 LOAD_NAME                1 (n)
              6 LOAD_NAME                2 (d)
              9 CALL_FUNCTION            2
             12 POP_TOP             
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE        
>>> dis.dis(compile('(n // d, n % d)', '', 'exec'))
  1           0 LOAD_NAME                0 (n)
              3 LOAD_NAME                1 (d)
              6 BINARY_FLOOR_DIVIDE 
              7 LOAD_NAME                0 (n)
             10 LOAD_NAME                1 (d)
             13 BINARY_MODULO       
             14 BUILD_TUPLE              2
             17 POP_TOP             
             18 LOAD_CONST               0 (None)
             21 RETURN_VALUE        

The // and % variant uses more opcodes, but the CALL_FUNCTION bytecode is a bear, performance wise.

//和%变量使用更多操作码,但CALL_FUNCTION字节码是一个熊,性能明智。


In PyPy, for small integers there isn't really much of a difference; the small speed advantage the opcodes have melts away under the sheer speed of C integer arithmetic:

在PyPy中,对于小整数,没有太大的区别;在C整数运算的绝对速度下,操作码融化的小速度优势:

>>>> import platform, sys, timeit
>>>> platform.python_implementation(), sys.version_info
('PyPy', (major=2, minor=7, micro=10, releaselevel='final', serial=42))
>>>> timeit.timeit('divmod(n, d)', 'n, d = 42, 7', number=10**9)
0.5659301280975342
>>>> timeit.timeit('n // d, n % d', 'n, d = 42, 7', number=10**9)
0.5471200942993164

(I had to crank the number of repetitions up to 1 billion to show how small the difference really is, PyPy is blazingly fast here).

(我不得不将重复次数增加到10亿,以显示差异的实际差异,PyPy在这里非常快)。

However, when the numbers get large, divmod() wins by a country mile:

但是,当数字变大时,divmod()会赢一个国家英里:

>>>> timeit.timeit('divmod(n, d)', 'n, d = 2**74207281 - 1, 26', number=100)
17.620037078857422
>>>> timeit.timeit('n // d, n % d', 'n, d = 2**74207281 - 1, 26', number=100)
34.44323515892029

(I now had to tune down the number of repetitions by a factor of 10 compared to hobbs' numbers, just to get a result in a reasonable amount of time).

(与hobbs的数字相比,我现在不得不将重复次数调低10倍,只是为了在合理的时间内得到结果)。

This is because PyPy no longer can unbox those integers as C integers; you can see the striking difference in timings between using sys.maxint and sys.maxint + 1:

这是因为PyPy不再可以将这些整数解包为C整数;您可以看到使用sys.maxint和sys.maxint + 1之间的时间差异显着:

>>>> timeit.timeit('divmod(n, d)', 'import sys; n, d = sys.maxint, 26', number=10**7)
0.008622884750366211
>>>> timeit.timeit('n // d, n % d', 'import sys; n, d = sys.maxint, 26', number=10**7)
0.007693052291870117
>>>> timeit.timeit('divmod(n, d)', 'import sys; n, d = sys.maxint + 1, 26', number=10**7)
0.8396248817443848
>>>> timeit.timeit('n // d, n % d', 'import sys; n, d = sys.maxint + 1, 26', number=10**7)
1.0117690563201904

#2


14  

Martijn's answer is correct if you're using "small" native integers, where arithmetic operations are very fast compared to function calls. However, with bigints, it's a whole different story:

如果您使用“小”本机整数,Martijn的答案是正确的,其中算术运算与函数调用相比非常快。然而,对于bigints,这是一个完全不同的故事:

>>> import timeit
>>> timeit.timeit('divmod(n, d)', 'n, d = 2**74207281 - 1, 26', number=1000)
24.22666597366333
>>> timeit.timeit('n // d, n % d', 'n, d = 2**74207281 - 1, 26', number=1000)
49.517399072647095

when dividing a 22-million-digit number, divmod is almost exactly twice as fast as doing the division and modulus separately, as you might expect.

当划分一个2200万位的数字时,divmod的速度几乎是分割和模数的两倍,正如您所预期的那样。

On my machine, the crossover occurs somewhere around 2^63, but don't take my word for it. As Martijn says, measure! When performance really matters, don't assume that what held true in one place will still be true in another.

在我的机器上,交叉发生在2 ^ 63左右,但不要接受我的话。正如Martijn所说,衡量!当表现真的很重要时,不要认为在一个地方保持真实的东西在另一个地方仍然是真的。