数数字-哪种方法最有效?

时间:2021-11-04 06:40:19

There are more than one solution for finding the-number-of-digits in a given number.

在给定的数字中查找数字的方法不止一种。

For example:

例如:

Method-1:

方法1:

int findn(int num)
{
    char snum[100];
    sprintf(snum, "%d", num);
    return strlen(snum);
}

Method-2:

方法2:

int findn(int num)
{
    int n = 0;
    while(num) {
        num /= 10;
        n++;
    }
    return n;
}

Method-3:

方法三:

int findn(int num)
{
    /* math.h included */
    return (int) log10(num) + 1;
}

The question is - what is the most efficient method? I know method-2 is O(n) but what about method-1 and method-3? How do I find the run-time complexity of library functions?

问题是,最有效的方法是什么?我知道method-2是O(n)那么method-1和method-3呢?如何发现库函数的运行时复杂性?

11 个解决方案

#1


19  

The following is even more efficient:

以下是更有效的方法:

int findn(int num)
{
   if ( num < 10 )
      return 1;
   if ( num < 100 )
      return 2;
   //continue until max int
}

You could optimize this even further by doing a binary search, but that would be overkill.

你可以通过二分查找来进一步优化它,但是这样做可能有点过头了。

#2


9  

The GCC/Clang __builtin_clz() or Microsoft Visual C _BitScanReverse() intrinsic functions compile to a single machine instruction on many machines. You can use this as the basis for an O(1) solution. Here's a 32-bit implementation:

GCC/Clang __builtin_clz()或Microsoft Visual C _BitScanReverse()的内部函数编译为许多机器上的单个机器指令。您可以将其作为O(1)解决方案的基础。这是一个32位的实现:

#include <limits.h>
#include <stdint.h>

/* Return the number of digits in the decimal representation of n. */
unsigned digits(uint32_t n) {
    static uint32_t powers[10] = {
        0, 10, 100, 1000, 10000, 100000, 1000000,
        10000000, 100000000, 1000000000,
    };
    static unsigned maxdigits[33] = {
        1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5,
        5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 
    };
    unsigned bits = sizeof(n) * CHAR_BIT - __builtin_clz(n);
    unsigned digits = maxdigits[bits];
    if (n < powers[digits - 1]) {
        -- digits;
    }
    return digits;
}

#3


7  

I think maybe you can write the first method as

我想你可以把第一种方法写成

int findn(int num)
{
    char snum[100];    
    return  sprintf(snum, "%d", num);
}

because sprintf will return the number of chars written and you can save the call to strlen.

因为sprintf将返回所写的字符数,您可以保存对strlen的调用。

As for the efficiency, I think it depends on the implementation of sprintf, you may need to find the source of sprintf and see if it is efficiency of doing this.

至于效率,我认为这取决于sprintf的实现,您可能需要找到sprintf的来源,看看这样做是否有效。

#4


7  

As it currently stands, the accepted and most highly approved answer is (still) incorrect for negative numbers. If the answerer were to take the time to test it and find out that it's broken for negative numbers, he likely would have wasted more time than the machine ever would by simply using snprintf, i.e.

就目前的情况而言,公认的、得到高度认可的答案(仍然)是负数。如果回答者花时间测试它并发现它被破坏为负数,他可能会比机器浪费更多的时间通过使用printsnf。

int count_digits(int arg) {
    return snprintf(NULL, 0, "%d", arg) - (arg < 0);
}

We're not in the 1980s anymore; stop coding as though we are. I'm a C-standard zealot and my favourite answer given here was Tao Feng's answer... but even that didn't go into why it's the most efficient answer so far; in this answer I intend to show that his answer can be improved further by considering the following:

我们不再是80年代了;停止编码,就像我们一样。我是c级*者,我最喜欢的答案是陶风的答案……但即便如此,这也不能解释为什么这是迄今为止最有效的答案;在这一答复中,我打算说明,他的答复可以进一步改进如下:

  • Programmer productivity is more important than code efficiency, because it will almost certainly cost more time to write and test new functions properly than it will for a few microseconds of runtime.
  • 程序员的生产力比代码效率更重要,因为与运行时几微秒相比,正确地编写和测试新函数几乎肯定要花费更多的时间。
  • Reusing the same standard library functions that other programs commonly use (probably) keeps those standard libraries in the CPU cache. A cache miss (for example, when your code needs to be copied from RAM into the CPU) can cost up to 50 CPU instructions, not to mention the other code may end up causing another cache miss to put snprintf back into the cache anyway.
  • 重用其他程序通常使用(可能)的标准库函数将这些标准库保存在CPU缓存中。缓存丢失(例如,当您的代码需要从RAM复制到CPU时)可能需要花费多达50个CPU指令,更不用说其他代码可能最终导致另一个缓存丢失,导致snprintf返回到缓存中。
  • Eliminating storage requirements might expose extra optimisations.
  • 消除存储需求可能会暴露额外的优化。

The following describes the micro-optimisation which hinders your productivity. Due to the lack of information you've provided in your answer, nobody who answers the question as it currently stands can provide any proof without making assumptions about:

以下描述了妨碍你工作效率的微观优化。由于你在回答中所提供的资料不足,任何回答目前问题的人都不能提供任何证据而不作任何假设:

  • When we optimise we need to find the most significant bottleneck in the full solution (the problem that your program's designed to solve). There are two possibilities here: A) You want to calculate the number of bytes to allocate in order to store a string containing these digits; B) You just want to count the number of digits or whatever for kicks. More on these later. For now it's important to realise you're probably talking about part of a solution, and that part might not be the most significant bottleneck.
  • 当我们进行优化时,我们需要找到完整解决方案中最重要的瓶颈(您的程序设计用来解决的问题)。这里有两种可能:A)您希望计算分配的字节数,以存储包含这些数字的字符串;B)你只需要数一数数字或其他的数字就可以了。稍后将详细介绍这些。现在,重要的是要意识到您可能正在讨论解决方案的一部分,而这一部分可能不是最重要的瓶颈。
  • The compiler you're using, the OS you're using and the machine you're using (including RAM speed, since some of us are introducing potential cache misses that are affected more by slow memory than by fast memory) might affect the most significant bottleneck. Some compilers are different to others, and will optimise some pieces of code better for some OSes, CPUs, etc than others.
  • 您正在使用的编译器、操作系统和正在使用的机器(包括RAM速度,因为我们中的一些人正在引入潜在的缓存丢失,这些丢失更多地受到慢内存的影响,而不是受到快内存的影响)可能会影响最重要的瓶颈。有些编译器与其他编译器不同,它们会对某些操作系统、cpu等进行更好的优化。

You can avoid micro-optimisation by measuring bottlenecks, i.e. by profiling ("benchmarking") each of these solutions on your system, assuming they even solve your problems properly. If a solution doesn't solve the problem, it's not a solution, so it shouldn't be considered... When done correctly this should eliminate the micro-optimisation. Some compilers even provide intelligent profile-guided optimisation which commonly shaves 20-30% by reorganising branches and objects for cache locality, and do so automatically.

您可以通过度量瓶颈来避免微优化,例如通过在您的系统上分析(“基准测试”)每个解决方案,假设它们甚至正确地解决了您的问题。如果解决方案不能解决问题,那它就不是解决方案,所以不应该考虑……如果做得正确,就应该消除微优化。一些编译器甚至提供了智能的概要文件引导优化,它通常通过重组分支和对象来实现缓存位置,并自动执行。

I've already covered option B, which I think most certainly answers your question, but there are cases where option A presents a very desirable optimisation, both in man hours and in machine hours.

我已经介绍了选项B,我认为它肯定回答了您的问题,但是在某些情况下,选项A在人的时间和机器时间都是非常可取的优化。

If you want to calculate the number of bytes to allocate in order to store a string containing these digits, you shouldn't use any runtime because a preprocessor macro can be used to calculate the maximum number of digits (or characters, including the sign), and any precious bytes of temporary storage you try to save will be well outnumbered by machine code bytes added in logic, which seems like a steep cost to me. There's also a benefit for the programmer to use a preprocessor macro; the same macro could be used for any integer type. See my answer to this question for a solution to this problem; after all, there's no point repeating myself...

如果你想计算分配的字节数为了存储一个字符串包含这些数字,你不应该使用任何运行时因为一个预处理器宏可以用来计算数字的最大数量(或字符,包括标志),以及任何宝贵的字节的临时存储你试图保存将会远远多于机器代码字节中添加逻辑,这似乎是一个陡峭的成本给我。程序员使用预处理器宏也有好处;可以对任何整数类型使用相同的宏。看我对这个问题的解答;毕竟,重复自己是没有意义的……

#5


3  

Try binary search. For the sake of clarity, let's assume signed 32-bit integers. First, check if x < 10000. Next, depending on the answer, if x < 100 or x < 1000000, and so on.

二分查找。为了清晰起见,我们假设有符号的32位整数。首先,检查x是否< 10000。接下来,根据答案,如果x < 100或x < 1000000,等等。

That's O(log n), where n is the number of digits.

这是O(log n) n是数字的个数。

#6


3  

These functions give drastically different results for non-positive numbers (the worst is method 3), so comparing their time complexities is of dubious value. I would use the one that gives the answer required in all cases; without context, we can't know what that is (it's probably not method 3).

这些函数对于非正数的结果截然不同(最坏的是方法3),所以比较它们的时间复杂度是值得怀疑的。我将使用在所有情况下给出所需答案的那个;没有上下文,我们无法知道它是什么(它可能不是方法3)。

For method 1, findn(0) == 1, and findn(-n) == digits in n + 1 (because of the negative sign).

对于方法1,findn(0) = 1, findn(-n) = n + 1中的位数(因为是负号)。

For method 2, findn(0) == 0, and findn(-n) == digits in n.

对于方法2,findn(0) = 0, findn(-n) = n中的位数。

For method 3, findn(0) == INT_MIN, and findn(-n) == INT_MIN as well.

对于方法3,findn(0) == INT_MIN和findn(-n) == INT_MIN。

#7


2  

I think sprintf() will use your method 2 to print the number (to determine the length of the string to print, and then to print each character of the string), so it will be inherently slower.

我认为sprintf()将使用您的方法2来打印数字(以确定要打印的字符串的长度,然后打印字符串的每个字符),因此它将天生的慢。

Number 3 would probably involve some polynomial approximation of ln() wich will involve more that 1 division, so I guess it will be slower as well (here's a fast ln() implementation, still involving float division... so WAY slower).

数字3可能会包含一些ln()的多项式逼近,它会包含更多的1个除法,所以我猜它也会更慢(这里有一个快速的ln()实现,仍然涉及浮点除法……所以慢)。

So my tentative guess is, method 2 is the way to go.

所以我的初步猜测是,方法2是可行的。

Please note that this is a quite liberal way to approach this problem. I guess testing a good old timed million-iteration with each function will tell you the result. But it would bee too bruteforce, wouldn't it?

请注意,这是处理这个问题的一种非常*的方式。我猜对每个函数进行百万次迭代测试会告诉你结果。但这太残忍了,不是吗?

Note that only method 2 will give you the real results, the others have flaws that have to be adjusted to be correct (see Aaron's answer). So simply pick method 2.

注意,只有方法2会给你真正的结果,其他的有缺陷,需要调整才能正确(请参阅Aaron的答案)。选择方法2。

#8


2  

One liner: for(digits = 0; num > 0; digits++) num /= 10;

一行:for(位数= 0;num > 0;数字+ +)num / = 10;

#9


1  

This is my solution... it'll count up to 100 digits or you know whatever you want it to count up to.

这是我的解决方案……它会数到100位数或者你知道你想要它数到多少。

max_digits = 100

int numdigits(int x) {
  for (int i = 1; i < max_digits; i++) {
    if (x < pow(10, i)) {
      return i;
    }
  }
  return 0;
}

#10


1  

printf function will return the number of digits successfully printed.

printf函数将返回成功打印的数字数。

int digits,n=898392;

digits=printf("%d",n);
printf("Number of digits:%d",digits);

Output:

输出:

898392

898392年

Number of digits:6

数字:6

#11


0  

Using log may be a good option...

使用日志可能是一个不错的选择……

  1. If the target machine has hardware support for it
  2. 如果目标机器有对它的硬件支持
  3. If you are sure that int can be cast into double and back without any loss of precision.
  4. 如果您确定int可以被转换成双精度和反精度。

Sample implementation...

示例实现……

int num_digits(int arg) {
    if (arg == 0) {
        return 1;
    }

    arg = abs(arg);

    return (int)log10(arg)+1;
}

#1


19  

The following is even more efficient:

以下是更有效的方法:

int findn(int num)
{
   if ( num < 10 )
      return 1;
   if ( num < 100 )
      return 2;
   //continue until max int
}

You could optimize this even further by doing a binary search, but that would be overkill.

你可以通过二分查找来进一步优化它,但是这样做可能有点过头了。

#2


9  

The GCC/Clang __builtin_clz() or Microsoft Visual C _BitScanReverse() intrinsic functions compile to a single machine instruction on many machines. You can use this as the basis for an O(1) solution. Here's a 32-bit implementation:

GCC/Clang __builtin_clz()或Microsoft Visual C _BitScanReverse()的内部函数编译为许多机器上的单个机器指令。您可以将其作为O(1)解决方案的基础。这是一个32位的实现:

#include <limits.h>
#include <stdint.h>

/* Return the number of digits in the decimal representation of n. */
unsigned digits(uint32_t n) {
    static uint32_t powers[10] = {
        0, 10, 100, 1000, 10000, 100000, 1000000,
        10000000, 100000000, 1000000000,
    };
    static unsigned maxdigits[33] = {
        1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5,
        5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 
    };
    unsigned bits = sizeof(n) * CHAR_BIT - __builtin_clz(n);
    unsigned digits = maxdigits[bits];
    if (n < powers[digits - 1]) {
        -- digits;
    }
    return digits;
}

#3


7  

I think maybe you can write the first method as

我想你可以把第一种方法写成

int findn(int num)
{
    char snum[100];    
    return  sprintf(snum, "%d", num);
}

because sprintf will return the number of chars written and you can save the call to strlen.

因为sprintf将返回所写的字符数,您可以保存对strlen的调用。

As for the efficiency, I think it depends on the implementation of sprintf, you may need to find the source of sprintf and see if it is efficiency of doing this.

至于效率,我认为这取决于sprintf的实现,您可能需要找到sprintf的来源,看看这样做是否有效。

#4


7  

As it currently stands, the accepted and most highly approved answer is (still) incorrect for negative numbers. If the answerer were to take the time to test it and find out that it's broken for negative numbers, he likely would have wasted more time than the machine ever would by simply using snprintf, i.e.

就目前的情况而言,公认的、得到高度认可的答案(仍然)是负数。如果回答者花时间测试它并发现它被破坏为负数,他可能会比机器浪费更多的时间通过使用printsnf。

int count_digits(int arg) {
    return snprintf(NULL, 0, "%d", arg) - (arg < 0);
}

We're not in the 1980s anymore; stop coding as though we are. I'm a C-standard zealot and my favourite answer given here was Tao Feng's answer... but even that didn't go into why it's the most efficient answer so far; in this answer I intend to show that his answer can be improved further by considering the following:

我们不再是80年代了;停止编码,就像我们一样。我是c级*者,我最喜欢的答案是陶风的答案……但即便如此,这也不能解释为什么这是迄今为止最有效的答案;在这一答复中,我打算说明,他的答复可以进一步改进如下:

  • Programmer productivity is more important than code efficiency, because it will almost certainly cost more time to write and test new functions properly than it will for a few microseconds of runtime.
  • 程序员的生产力比代码效率更重要,因为与运行时几微秒相比,正确地编写和测试新函数几乎肯定要花费更多的时间。
  • Reusing the same standard library functions that other programs commonly use (probably) keeps those standard libraries in the CPU cache. A cache miss (for example, when your code needs to be copied from RAM into the CPU) can cost up to 50 CPU instructions, not to mention the other code may end up causing another cache miss to put snprintf back into the cache anyway.
  • 重用其他程序通常使用(可能)的标准库函数将这些标准库保存在CPU缓存中。缓存丢失(例如,当您的代码需要从RAM复制到CPU时)可能需要花费多达50个CPU指令,更不用说其他代码可能最终导致另一个缓存丢失,导致snprintf返回到缓存中。
  • Eliminating storage requirements might expose extra optimisations.
  • 消除存储需求可能会暴露额外的优化。

The following describes the micro-optimisation which hinders your productivity. Due to the lack of information you've provided in your answer, nobody who answers the question as it currently stands can provide any proof without making assumptions about:

以下描述了妨碍你工作效率的微观优化。由于你在回答中所提供的资料不足,任何回答目前问题的人都不能提供任何证据而不作任何假设:

  • When we optimise we need to find the most significant bottleneck in the full solution (the problem that your program's designed to solve). There are two possibilities here: A) You want to calculate the number of bytes to allocate in order to store a string containing these digits; B) You just want to count the number of digits or whatever for kicks. More on these later. For now it's important to realise you're probably talking about part of a solution, and that part might not be the most significant bottleneck.
  • 当我们进行优化时,我们需要找到完整解决方案中最重要的瓶颈(您的程序设计用来解决的问题)。这里有两种可能:A)您希望计算分配的字节数,以存储包含这些数字的字符串;B)你只需要数一数数字或其他的数字就可以了。稍后将详细介绍这些。现在,重要的是要意识到您可能正在讨论解决方案的一部分,而这一部分可能不是最重要的瓶颈。
  • The compiler you're using, the OS you're using and the machine you're using (including RAM speed, since some of us are introducing potential cache misses that are affected more by slow memory than by fast memory) might affect the most significant bottleneck. Some compilers are different to others, and will optimise some pieces of code better for some OSes, CPUs, etc than others.
  • 您正在使用的编译器、操作系统和正在使用的机器(包括RAM速度,因为我们中的一些人正在引入潜在的缓存丢失,这些丢失更多地受到慢内存的影响,而不是受到快内存的影响)可能会影响最重要的瓶颈。有些编译器与其他编译器不同,它们会对某些操作系统、cpu等进行更好的优化。

You can avoid micro-optimisation by measuring bottlenecks, i.e. by profiling ("benchmarking") each of these solutions on your system, assuming they even solve your problems properly. If a solution doesn't solve the problem, it's not a solution, so it shouldn't be considered... When done correctly this should eliminate the micro-optimisation. Some compilers even provide intelligent profile-guided optimisation which commonly shaves 20-30% by reorganising branches and objects for cache locality, and do so automatically.

您可以通过度量瓶颈来避免微优化,例如通过在您的系统上分析(“基准测试”)每个解决方案,假设它们甚至正确地解决了您的问题。如果解决方案不能解决问题,那它就不是解决方案,所以不应该考虑……如果做得正确,就应该消除微优化。一些编译器甚至提供了智能的概要文件引导优化,它通常通过重组分支和对象来实现缓存位置,并自动执行。

I've already covered option B, which I think most certainly answers your question, but there are cases where option A presents a very desirable optimisation, both in man hours and in machine hours.

我已经介绍了选项B,我认为它肯定回答了您的问题,但是在某些情况下,选项A在人的时间和机器时间都是非常可取的优化。

If you want to calculate the number of bytes to allocate in order to store a string containing these digits, you shouldn't use any runtime because a preprocessor macro can be used to calculate the maximum number of digits (or characters, including the sign), and any precious bytes of temporary storage you try to save will be well outnumbered by machine code bytes added in logic, which seems like a steep cost to me. There's also a benefit for the programmer to use a preprocessor macro; the same macro could be used for any integer type. See my answer to this question for a solution to this problem; after all, there's no point repeating myself...

如果你想计算分配的字节数为了存储一个字符串包含这些数字,你不应该使用任何运行时因为一个预处理器宏可以用来计算数字的最大数量(或字符,包括标志),以及任何宝贵的字节的临时存储你试图保存将会远远多于机器代码字节中添加逻辑,这似乎是一个陡峭的成本给我。程序员使用预处理器宏也有好处;可以对任何整数类型使用相同的宏。看我对这个问题的解答;毕竟,重复自己是没有意义的……

#5


3  

Try binary search. For the sake of clarity, let's assume signed 32-bit integers. First, check if x < 10000. Next, depending on the answer, if x < 100 or x < 1000000, and so on.

二分查找。为了清晰起见,我们假设有符号的32位整数。首先,检查x是否< 10000。接下来,根据答案,如果x < 100或x < 1000000,等等。

That's O(log n), where n is the number of digits.

这是O(log n) n是数字的个数。

#6


3  

These functions give drastically different results for non-positive numbers (the worst is method 3), so comparing their time complexities is of dubious value. I would use the one that gives the answer required in all cases; without context, we can't know what that is (it's probably not method 3).

这些函数对于非正数的结果截然不同(最坏的是方法3),所以比较它们的时间复杂度是值得怀疑的。我将使用在所有情况下给出所需答案的那个;没有上下文,我们无法知道它是什么(它可能不是方法3)。

For method 1, findn(0) == 1, and findn(-n) == digits in n + 1 (because of the negative sign).

对于方法1,findn(0) = 1, findn(-n) = n + 1中的位数(因为是负号)。

For method 2, findn(0) == 0, and findn(-n) == digits in n.

对于方法2,findn(0) = 0, findn(-n) = n中的位数。

For method 3, findn(0) == INT_MIN, and findn(-n) == INT_MIN as well.

对于方法3,findn(0) == INT_MIN和findn(-n) == INT_MIN。

#7


2  

I think sprintf() will use your method 2 to print the number (to determine the length of the string to print, and then to print each character of the string), so it will be inherently slower.

我认为sprintf()将使用您的方法2来打印数字(以确定要打印的字符串的长度,然后打印字符串的每个字符),因此它将天生的慢。

Number 3 would probably involve some polynomial approximation of ln() wich will involve more that 1 division, so I guess it will be slower as well (here's a fast ln() implementation, still involving float division... so WAY slower).

数字3可能会包含一些ln()的多项式逼近,它会包含更多的1个除法,所以我猜它也会更慢(这里有一个快速的ln()实现,仍然涉及浮点除法……所以慢)。

So my tentative guess is, method 2 is the way to go.

所以我的初步猜测是,方法2是可行的。

Please note that this is a quite liberal way to approach this problem. I guess testing a good old timed million-iteration with each function will tell you the result. But it would bee too bruteforce, wouldn't it?

请注意,这是处理这个问题的一种非常*的方式。我猜对每个函数进行百万次迭代测试会告诉你结果。但这太残忍了,不是吗?

Note that only method 2 will give you the real results, the others have flaws that have to be adjusted to be correct (see Aaron's answer). So simply pick method 2.

注意,只有方法2会给你真正的结果,其他的有缺陷,需要调整才能正确(请参阅Aaron的答案)。选择方法2。

#8


2  

One liner: for(digits = 0; num > 0; digits++) num /= 10;

一行:for(位数= 0;num > 0;数字+ +)num / = 10;

#9


1  

This is my solution... it'll count up to 100 digits or you know whatever you want it to count up to.

这是我的解决方案……它会数到100位数或者你知道你想要它数到多少。

max_digits = 100

int numdigits(int x) {
  for (int i = 1; i < max_digits; i++) {
    if (x < pow(10, i)) {
      return i;
    }
  }
  return 0;
}

#10


1  

printf function will return the number of digits successfully printed.

printf函数将返回成功打印的数字数。

int digits,n=898392;

digits=printf("%d",n);
printf("Number of digits:%d",digits);

Output:

输出:

898392

898392年

Number of digits:6

数字:6

#11


0  

Using log may be a good option...

使用日志可能是一个不错的选择……

  1. If the target machine has hardware support for it
  2. 如果目标机器有对它的硬件支持
  3. If you are sure that int can be cast into double and back without any loss of precision.
  4. 如果您确定int可以被转换成双精度和反精度。

Sample implementation...

示例实现……

int num_digits(int arg) {
    if (arg == 0) {
        return 1;
    }

    arg = abs(arg);

    return (int)log10(arg)+1;
}