计算两个不同数字的倍数之间的差值

This is an algorithmic problem. To keep it simple, say I have two doubles, A and B. I want to construct a function that will give me the difference until the next multiple of A or the next multiple of B, if that makes sense.

这是一个算法问题。为了简单起见，假设我有两个双打，A和B，我想构造一个函数，它能让我知道，直到A的下一个倍数或者是B的下一个倍数，如果这有意义的话。

For instance, say A is 3 and B is 5.

例如，A是3 B是5。

Consider the multiples: (3,6,9,12,15) and (5,10,15).

考虑倍数:(3、6、9、12、15)和(5、10、15)。

I'd want the function to output: (3, 2, 1, 3, 1, 2, 3), since it takes 3 units to get to 3, then 2 more to get to 5, then 1 to 6, then 3 to 9, etc...

我希望函数输出:(3 2 1 3 1 2 3 2 3 3)因为需要3个单位才能得到3，然后2个单位才能得到5，然后1到6，然后3到9，等等……

I hope this makes sense. Ideally it's a Python-esque generator (although I'm writing this in Arduino ~ C++). I need it to be fast - really fast.

我希望这说得通。理想情况下，它是python式的生成器(尽管我是用Arduino ~ c++编写的)。我需要它的速度——非常快。

Any help would really be appreciated. My pseudocode is below, but it's not that great.

任何帮助都是值得感激的。我的伪代码在下面，但不是很好。

a = 3
b = 5

current = 0
distToA = a
distToB = b
for i in xrange(100):
  if distToA > distToB: #B comes first
    print "Adding {0}".format(distToB)
    current += distToB
    distToA -= distToBb
    distToB = b
  elif distToB > distToA: #A comes first
    print "Adding {0}".format(distToA)
    current += distToA
    distToB -= distToA
    distToA = a
  else: #Equal
    print "Adding {0}".format(distToA)
    current += distToA #Arbitrarily, could be distToB
    distToA = a
    distToB = b

EDIT: How would this look with multiple values? Not just a and b, but also c, d, e, etc.. I'd imagine I'd just do some more if statements, but the cost is higher (more operations per branch).

编辑:如果有多个值会是什么样子?不仅仅是a和b，还有c, d, e，等等。我想我只是多做一些if语句，但是成本更高(每个分支更多的操作)。

3 个解决方案

#1

Let start with some general points. It is pretty much always better to start out with intuitive code that will be understood by you and your coworkers. Then measure the performance and find bottlenecks. If you try to hyper-optimize from the outset, you will: -

让我们从一些一般的点开始。最好从你和你的同事都能理解的直觉代码开始。然后测量性能并找到瓶颈。如果你试着从一开始就超优化，你会:-

make code that is complicated, error prone and less understandable.
使代码变得复杂、容易出错、难以理解。
most likely optimise code that would barely register a blip on the overall performance, while overlooking major bottlenecks. Unless you know the processor, compiler, programming language and environmental nuances back to front there is a good chance you will make the performance worse if you try to guess the optimizations.
最可能优化的代码几乎不会影响整体性能，同时忽略了主要的瓶颈。除非您知道处理器、编译器、编程语言和环境的细微差别，否则如果您尝试猜测优化，很可能会使性能变得更差。

It is best to measure, find bottlenecks, then improve the performance for those bottlenecks. If you suspect an algorithm / implementation is slow, then profile it. If you are wondering which algorithm / implementation will perform best, then race them. Test with varying data sets because what performs well for one set of input (3,5) might not for another (3, 500000000).

最好是度量、发现瓶颈，然后改进这些瓶颈的性能。如果您怀疑某个算法/实现速度太慢，那么请对其进行剖析。如果您想知道哪种算法/实现的性能最好，那么对它们进行竞争。使用不同的数据集进行测试，因为在一组输入(3,5)中表现良好的数据，在另一组输入(3,500000000)中可能表现不佳。

Having said that, lets start with what you have and explore some options and then finally provide a starting implementation for the case that you described in your edit, i.e. multiple values. Please note that some of these implementations might not be ideal for your case or apply to your environment, but they touch on a general approach.

已经说过，让我们从您拥有的内容开始，并探索一些选项，最后为您在编辑中描述的案例(即多个值)提供一个启动实现。请注意，其中一些实现可能不适合您的情况或应用于您的环境，但它们涉及到一种通用方法。

Status Quo (your code as-is)

现状(你的代码现状)

This code does a few conditional and arithmetic operations. These are the kinds of operations that processors eat for breakfast...before they wake up...in the blink of a nanopartical's eyelid, i.e. very fast. Now, I know that you are using Arduino and so will not have the most powerful processor in the world to play with, but still, these are the operations that processors do very quickly. I wanted to create some benchmarks of my own, so I implemented a very similar function to yours in C++ (you mentioned that C++ is OK in your question). I called the test ConditionalTest because it follows an if...else flow and because I am bad at names.

此代码执行一些条件和算术操作。这些就是处理器早餐吃的那种操作……才醒来……在一眨眼的时间内，纳米颗粒的眼睑，即非常快。现在，我知道您正在使用Arduino，因此不会使用世界上最强大的处理器，但这些操作仍然是处理器快速执行的操作。我想创建一些我自己的基准测试，所以我在c++中实现了一个与您的非常类似的函数(您在您的问题中提到c++没问题)。我将test ConditionalTest命名为if。其他的流，因为我的名字不好。

Note: while I have done some rudimentary tests on the results, the code provided in these answers is by no means production ready. It is missing basic parameter checks (such as null values or uninitialised variables) and has some performance optimizations that I would normally omit in preference for safety. Anyway, the code is:

注意:虽然我已经对结果做了一些基本的测试，但是这些答案中提供的代码并没有准备好生产。它缺少基本的参数检查(如空值或未初始化的变量)，并且有一些性能优化，我通常为了安全而省略这些优化。无论如何,代码是:

static void ConditionalTest( int startA, int startB, unsigned long long testIterations )
{       
    gl_result = 0;      
    gl_current=0;
    int distToA = startA;
    int distToB = startB;

    for( unsigned long long i = 0; i < testIterations; i++ )
    {           
        if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;
            gl_current += distToB;
            distToA -= distToB;
            distToB = startB;               
        }
        else if( distToB > distToA ) //A comes first
        {       
            gl_result = distToA;
            gl_current += distToA;
            distToB -= distToA;
            distToA = startA;                               
        }
        else
        {       
            gl_result = distToA; 
            gl_current += distToA; //Arbitrarily, could be distToB
            distToA = startA;
            distToB = startB;
        }
    }    
}

Note: -

注意:-

I assign the value to a global gl_result rather than printing it to save on filling up my console with messages and also because the operation of printing to screen takes ages compared to the other operations and so it would bloat the results.
我将值赋给一个全局gl_result，而不是将其打印出来，以节省在我的控制台中填充消息的时间，还因为与其他操作相比，打印到屏幕的操作要花费较长的时间，因此会导致结果膨胀。
I had to use unsigned long long for testIterations and some other variables because otherwise int would wrap around.
我必须使用无符号long来表示证明和其他变量，否则int将被包围。
gl_ are global variables in the test.
gl_是测试中的全局变量。

The benefit of this algorithm is that it uses very basic constructs, so

这个算法的好处是它使用了非常基本的结构

other programmers with even a very basic understanding of programming or from other programming languages will quickly understand what it is doing.
对编程或其他编程语言有基本了解的其他程序员很快就会明白它在做什么。
it is very portable - it is easy to translate to other languages and operating systems.
它是非常便携的-它很容易翻译到其他语言和操作系统。
regarding performance, is wysiwyg - what you see is what you get, so it is unlikely that there are big performance bottlenecks hidden in 3rd party library calls.
关于性能，wysiwyg——您所看到的是您所得到的，因此在第三方库调用中不太可能隐藏巨大的性能瓶颈。

Now, I am running a reasonably grunty machine (i7 2600) so it took 1000000000 (1 billion) iterations to start getting results that took more than a second. In this case, it took on average 2400 milliseconds to do 1 billion iterations. I think that is pretty quick, but lets look at how we can improve on things. First lets see what we can tweak.

现在，我正在运行一台相当糟糕的机器(i72600)，因此需要10亿次迭代才能开始获得超过一秒的结果。在这种情况下，进行10亿次迭代平均花费2400毫秒。我认为这是相当快的，但让我们看看我们可以如何改进。首先让我们看看我们能做些什么。

A tweak to your implementation

实现的一个调整

The arguments are (3,5), so initially distA is 3 and distB is 5. Note that 3 is smaller than 5. The first if will check if distToA > distToB: then elif distToB > distToA:. However, distToB (initially 5) is twice as likely to be greater than distToA (initially 3). For performance, you want the first if condition to be satisfied as often as possible in order to minimize the number of conditions that are checked in each iteration. In saying this I am making some assumptions about the compiler, but more on that later.

参数是(3,5)，所以最初distA是3，而distB是5。注意3比5小。第一个if将检查distToA > distToB:则elif distToB > distToA:。但是，distToB(最初的5)比distToA(最初的3)大两倍。说到这里，我对编译器做了一些假设，但后面会详细介绍。

So, very simply, we can swap the ifs around. However, it is not that simple. The problem I found with this is that the compiler is doing some nice optimisations on the second if and last else. You see where you had the comment Arbitrarily, could be distToB? Well, the fact that you have gl_current += distToA; in the else if and gl_current += distToA in the else allowed the compiler to optimise this to one statement. So, in my case it is not arbitrary (for you it will depend on your compiler). So, we need to change the else to allow these optimizations to occur. The final code is:

很简单，我们可以交换ifs。然而，事情并非如此简单。我发现这个问题是编译器在第二个if和最后一个else上做了一些很好的优化。你看到你的评论了吗，可以是distToB?有gl_current += distToA;在else if和gl_current += distToA中，else允许编译器将其优化为一个语句。所以，在我的例子中，它不是任意的(对你来说，它取决于你的编译器)。因此，我们需要更改else，以允许进行这些优化。最后的代码是:

static void ConditionalTestV2( int startA, int startB, unsigned long long testIterations )
{       
    gl_result = 0;      
    gl_current=0;
    int distToA = startA;
    int distToB = startB;

    for( unsigned long long i = 0; i < testIterations; i++ )
    {                       
        if( distToB > distToA ) //A comes first (where a is more likely to be first than b)
        {       
            gl_result = distToA;
            gl_current += distToA;
            distToB -= distToA;
            distToA = startA;                               
        }
        else if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;
            gl_current += distToB;
            distToA -= distToB;
            distToB = startB;               
        }
        else
        {       
            gl_result = distToB;   //Should be distToB for optimisations
            gl_current += distToB; //Should be distToB for optimisations
            distToA = startA;
            distToB = startB;
        }
    }      
}

Note: if( distToB > distToA ) is before else if( distToA > distToB ) and that the else now has gl_result = distToB and gl_current += distToB. With those changes in place, the time that the test took to run was: 2108 milliseconds. It is nice that those simple tweaks gave a 12% reduction in execution time.

注:if(distToB > distToA)是在else if(distToA > distToB)，而else现在有gl_result = distToB和gl_current += distToB。有了这些更改，测试运行的时间为:2108毫秒。很高兴这些简单的调整使执行时间减少了12%。

The biggest lesson from this is to measure any change that you make for unintended consequences.

这其中最大的教训是，衡量你为意外后果所做的任何改变。

Your compiler and execution environment may differ from mine, so your results may vary. If you are going to start tweaking things at this level, I would suggest becoming familiar with assembler and stepping through the assembly at the critical points to determine how the conditions are actually being implemented. I am sure that there are other optimizations such as these that can be made. If you really get into it and are using GNU C++, there is something called __builtin_expect where you can guide the compiler about which branch to favour.

您的编译器和执行环境可能与我的不同，因此您的结果可能会有所不同。如果您打算开始在这个级别上进行调整，我建议您熟悉汇编程序，并在关键时刻遍历程序集，以确定实际如何实现条件。我确信还可以进行其他优化，比如这些优化。如果您真的了解它，并且正在使用GNU c++，那么有一个叫做__builtin_expect的东西，您可以在其中指导编译器选择哪个分支。

You may not always get the starting values in order, in which case you would need to way the cost of a one-off sort against the overall time of executing your algorithm.

您可能并不总是按顺序获得起始值，在这种情况下，您需要将一次性排序的成本与执行算法的总时间进行比较。

Some other things to point out are: -

还有一些需要指出的事情是:-。

You maintain a variable current, but you do not use it. If you are not using current, then you can remove it. You might not see a performance gain if the compiler already optimised it out.
你保持一个可变的电流，但你不使用它。如果您不使用电流，那么您可以删除它。如果编译器已经优化了性能，您可能看不到性能的提高。
You have a range of 100, but the cycle will repeat ever 3 * 5 = 15 times. So, you could either stop when current is 15 if that is all you need or you could store the results and then just write them out (see the patterns section).
你的范围是100，但是这个循环会重复3 * 5 = 15次。所以，当电流为15时，你可以停止，如果这是你所需要的，或者你可以存储结果，然后把它们写出来(见模式部分)。

Modulo

模

Looking at the algorithm, we are always getting the distance to a value, so one approach that springs to mind is modulo (there is already an answer to cover this). I am a bit suspicious of the performance because modulo tends to use division which is slower than your subtraction operations. Anyway, this is what I came up with:

看一下算法，我们总是得到到一个值的距离，所以一种让我们想到的方法是modulo(已经有一个解决这个问题的答案了)。我对性能有点怀疑，因为模块化倾向于使用比减法运算慢的除法。总之，这就是我想到的:

static void ModuloTest( int startA, int startB, unsigned long long testIterations )
{   
    unsigned long long current = 0;
    unsigned long long prev = 0;
    int distToA = startA;
    int distToB = startB;

    for( long long i = 0; i < testIterations; i++ )
    {       
        current += (gl_result = FastMin(distToA - (current%distToA), distToB - (current%distToB)));
    }
}

The result was 23349 milliseconds. Almost 10 times slower than your original.

结果是23349毫秒。几乎比原来慢了10倍。

Now, I normally wouldn't write a line such as the one that has current += (gl..., but I was trying to reduce the number of assignments. This is generally, a silly thing to do, because the compiler will optimise better than me and also it is more error prone. Still, this test was quite a bit slower and I wanted to make sure I gave it a good chance. It is a bit unfair to start pointing the finger at modulo straight away as the flow is a bit different, so maybe something else is to blame. So, I made an even simpler modulo test:

现在，我通常不会写一行，比如有current += (gl…)，但我想减少作业的数量。这通常是一种愚蠢的做法，因为编译器会优化得比我好，而且更容易出错。尽管如此，这个测试还是有点慢，我想确保我给了它一个很好的机会。开始直接将手指指向modulo是有点不公平的，因为流程有点不同，所以可能还有其他的原因。我做了一个更简单的模块测试

static void PureModuloTest( unsigned long long testIterations, unsigned long long mod )
{
    for(long long i = 1; i <= testIterations; i++)
    {
        gl_result = mod % i;
    }
}

where mod was 50000 and even in this case the test took 5 times longer than your test, so I think that modulo is out if we are looking for a pure performance gain. I also found some surprising inefficiencies with the stl min(), but to go into detail would make this long post even longer.

mod是50000，即使在这种情况下测试比你的测试要长5倍，所以我认为如果我们想要纯粹的性能收益，模块化是不存在的。我还发现stl min()有一些令人惊讶的低效之处，但深入研究细节将使这篇文章更长。

The next thing I did was looked at the data. Sometimes if you can find characteristics / patterns in the data you can optimise your implementation accordingly.

接下来我要做的就是查看数据。有时，如果您能够在数据中找到特征/模式，您就可以相应地优化实现。

Patterns

模式

Looking at your data again, something that jumps out is that the differences will repeat every a * b cycles. So, in your test, once you get to 15 the distances will repeat. You are probably already aware of this, but in your code snippet you run the test for 100 cycles (for i in xrange(100)) so I wasn't sure.

再看一遍你的数据，有些东西会跳出来，它们之间的差异会重复每一个a * b循环。所以，在你的测试中，一旦到了15，距离就会重复。您可能已经意识到了这一点，但是在您的代码片段中，您运行了100个周期(对于xrange(100)中的i)，所以我不确定。

One way to use this fact is to store the values until we get to a * b and then just reuse the values until we are done. Note that this essentially a matter of using your algorithm to begin with and then iterating through a list from then on.

使用这个事实的一种方法是将值存储到a * b，然后重用这些值直到完成。请注意，这本质上是使用算法开始，然后从那时开始遍历列表。

static void PatternTest( int startA, int startB, unsigned long long testIterations )
{
    int stop = startA * startB;
    list<int> resultList;
    int distToA = startA;
    int distToB = startB;   
    int val = 0;
    long long count = 0;
    while( val < stop  )
    {   
        if( distToB > distToA ) //A comes first (where a is more likely to be first than b)
        {       
            gl_result = distToA;                
            distToB -= distToA;
            distToA = startA;                               
        }
        else if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;                
            distToA -= distToB;
            distToB = startB;               
        }
        else
        {       
            gl_result = distToB;                
            distToA = startA;
            distToB = startB;
        }
        val += gl_result;
        resultList.push_back(gl_result);
        count++;
    }
    std::list<int>::const_iterator iter;
    while( count < testIterations )
    {
        for( iter = resultList.begin(); iter != resultList.end() && count < testIterations; iter++ )
        {
            gl_result = *iter;
            count++;
        }       
    }
}

This test took 1711 milliseconds, around 29% faster than the original and about 18% faster than the current best. I am not sure how applicable this is in your case, but it is an example of how analyzing the expected data can provide some good performance gains.

这个测试耗时1711毫秒，比原始测试快29%，比当前测试快18%。我不确定这在您的案例中是否适用，但这是分析预期数据如何提供一些良好性能收益的一个例子。

Thread Bonanza!

线程财源滚滚!

Now, this probably doesn't apply in your case since you are working with Arduino. But maybe threads will be supported in future or maybe you can farm the problem out to different processors. Either way, it would be unkind not to include a threading benchmark since this is what they live for. Also, my computer has 8 cores, 7 of which spend their time lazing about, so it is nice to give them a chance to run wild.

现在，这可能不适用于你的情况因为你和Arduino一起工作。但是将来可能会支持线程，或者您可以将问题分发给不同的处理器。不管怎样，不包含线程基准都是不友好的，因为这正是它们的目的。另外，我的电脑有8个内核，其中7个内核是用来消磨时间的，所以给它们一个*奔跑的机会是很不错的。

If your data or algorithm can be broken into independent discrete parts, then you could design your program so that it runs independent operations on separate threads. Now we know from before that the sequence repeats every a * b. So, we could start different points n where '(n modulo (a * b)) == 0'.

如果您的数据或算法可以分解成独立的离散部分，那么您可以设计您的程序，使其在独立的线程上运行独立的操作。现在我们知道这个序列会重复每一个a * b，因此，我们可以从不同的点n开始，其中'(n modulo (a * b)) = 0'

But, we could do better and first get the values for the first a * b and then loop through the values on the separate threads. Which is what I have done here. I chose to run 4 threads.

但是，我们可以做得更好，首先获取第一个a * b的值，然后循环遍历独立线程上的值。这就是我在这里所做的。我选择运行4个线程。

struct BonanzaThreadInfo
{
    long long iterations;
    list<int> resultList;
    int result;
};

static void BonanzaTestThread( void* param )
{
    BonanzaThreadInfo* info = (BonanzaThreadInfo*)param;    

    std::list<int>::const_iterator iter;
    for( long long count = 0; count < info->iterations; )
    {
        for( iter = info->resultList.begin(); iter != info->resultList.end() && count < info->iterations; iter++ )
        {
            info->result = *iter;           
            count++;
        }   
    }
    delete param;
}

static void ThreadBonanzaTest( int startA, int startB, unsigned long long testIterations )
{   
    int stop = startA * startB;
    list<int> resultList;
    int distToA = startA;
    int distToB = startB;   
    int val = 0;
    long long count = 0;
    while( val < stop  )
    {       
        if( distToB > distToA ) //A comes first (where a is more likely to be first than b)
        {       
            gl_result = distToA;                
            distToB -= distToA;
            distToA = startA;                               
        }
        else if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;                
            distToA -= distToB;
            distToB = startB;               
        }
        else
        {       
            gl_result = distToB;                
            distToA = startA;
            distToB = startB;
        }
        val += gl_result;
        resultList.push_back(gl_result);
        count++;
    }
    long long threadIterations = (testIterations - count) / NUMTHREADS;
    long long iterationsLeft = testIterations-count;
    thread* bonanzaThreads = new thread[NUMTHREADS];
    for( int i = 0; i < NUMTHREADS; i++ )
    {
        BonanzaThreadInfo* bonanzaThreadInfo = new BonanzaThreadInfo;
        if( i == (NUMTHREADS - 1) )
        {
            bonanzaThreadInfo->iterations = iterationsLeft;
        }
        else
        {
            iterationsLeft -= threadIterations;
            bonanzaThreadInfo->iterations = (threadIterations);
        }       
        bonanzaThreadInfo->resultList = resultList;
        bonanzaThreads[i] = thread(BonanzaTestThread,bonanzaThreadInfo);//http://*.com/a/10662506/746754        
    }
    for( int i = 0; i < NUMTHREADS; i++ )
    {
        bonanzaThreads[i].join();
    }
    delete [] bonanzaThreads;
}

The result is that this took 574 milliseconds. A whopping saving of 76%! Some basic points to note about threads: -

结果是，这花费了574毫秒。节省了76%!关于线程要注意的一些基本要点:-

The complexity and room for error increases dramatically.
复杂性和出错的空间显著增加。
If there is any shared resource between the threads, then that resource will need to be protected by a mutex. If threads frequently need the same protected resource at the same time, then all threads that need that resource will need to wait until it is availabe which can result if very poor performance.
如果线程之间有任何共享资源，那么该资源将需要由互斥体保护。如果线程经常同时需要相同的受保护资源，那么所有需要该资源的线程都需要等到它可用时，如果性能非常差，就会导致这种情况。

Here is a graph of where we are up to so far:

这是我们到目前为止的图表:

计算两个不同数字的倍数之间的差值

Now, to your edit about multiple values.

现在，到你的编辑关于多个值。

Multiple Values

多个值

Well, as far as I can see, if you have multiple input values (a,b,c,d...) your if statements are going to become very nested and length very quickly. if a < b && a < c && a < d...

就我所见，如果你有多个输入值(a,b,c,d…)你的if语句将会变得非常嵌套并且长度很快。如果a < b && a < c & a < d…

We are generally trying to order the next values, so that is where I would start. My first thought is to store the values in some ordered data structure. I chose to use a set because a set is naturally ordered by a key (actually it is a multiset because we need to allow dupes). Inside the set, I put a struct (called ValuesStruct because I am very bad at names) that contains the value to increment by (a,b,c) as well as the next integer where this value will be the closest. The < operator is so that stl knows where to put this value in the set.

我们一般都在尝试排序下一个值，这就是我要开始的地方。我的第一个想法是将值存储在有序的数据结构中。我之所以选择使用集合，是因为集合是按键排序的(实际上它是一个多集，因为我们需要允许dupes)。在这个集合中，我放置了一个struct(称为ValuesStruct，因为我的名字非常糟糕)，它包含了(a,b,c)和下一个整数的增量，这个值将是最接近的值。 <运算符使stl知道把这个值放在集合的什么地方。< p>

struct ValuesStruct
{
public:
    int Value;
    long long Next;
    ValuesStruct( int start )
    {
        Value = start;
        Next = start;
    }
    bool operator < (const ValuesStruct& rOther) const
    {
        return (Next < rOther.Next);
    }
private:
    ValuesStruct()
    {

    }
};

Then, all I need to do is iterate through the set. On each iteration, the front of the set will have the minumum next value. So I can calculate the current interval by subtracting the previous from this. Then, I just need a do..while() loop to remove this value from the list and add it back in with the updated Next value, so that it will take the appropriate position in the set. I need to do it for all values that had this as Next (e.g. as would be the case at 15 for your simple 3,5 example). I called the test MultiConditionalTest because here we need to check multiple comparison conditions and because I am so bad at names.

然后，我需要做的就是遍历集合。在每次迭代中，集合的前面都有minumum下一个值。所以我可以用这个减去前一个来计算当前的间隔。然后,我只需要一个做. .()循环从列表中删除该值并将其添加在与更新的未来价值,这样它将集合中的相应位置。我需要做的所有值,这是下一个(例如,将简单的在15 3 5例)。我称之为test MultiConditionalTest，因为这里我们需要检查多个比较条件，因为我太不擅长命名。

static void MultiConditionalTest( multiset<ValuesStruct>& values, unsigned long long testIterations )
{               
    unsigned long long prev = 0;
    for( unsigned long long i = 0; i < testIterations; i++ )
    {
        multiset<ValuesStruct>::iterator iter = values.begin();     
        gl_result = (*(iter)).Next - prev;
        prev = (*(iter)).Next;
        do //handle case where equal
        {
            ValuesStruct valuesStruct = *iter;
            values.erase(iter);
            valuesStruct.Next += valuesStruct.Value;
            values.insert( valuesStruct );
            iter = values.begin();
        }while( (*iter).Next == prev );
    }
}

The function is used as follows:

其功能如下:

multiset<ValuesStruct> values;
values.insert(ValuesStruct(3));
values.insert(ValuesStruct(5));
values.insert(ValuesStruct(7));
values.insert(ValuesStruct(11));
MultiConditionalTest( values, testIterations );

As you can see, there is a lot going on here, so I expected a bit of a performance blowout and got it: 105156 milliseconds - about 50 times slower. This is still less than a microsecond per iteration, so again it depends on what you are aiming at. Since I just banged this up this evening without analyzing it much I am pretty sure there are performance optimizations that can be made. First, the set is normally implemented as a binary search tree. I would do some research and determine whether this is the best data structure for this problem. Also, when inserting a new value into the list a hint can be given as to where it will be placed. If we are clever about choosing the position then we might be able to speed this operation up. Also, as before, the sequence will repeat when we get to (a * b * c * d...), so we could store the values and then just write them out from then on. I would also look at the problem space and see if there is a way to optimise the algorithm, possibly ask about the mathematical sequence on math.stackexchange.com - those guys are pretty sharp.

正如您所看到的，这里发生了很多事情，所以我预期会出现性能井喷，结果是:105156毫秒——大约慢50倍。这仍然小于每一次迭代的一微秒，因此这同样取决于您的目标。由于我今天晚上在没有分析它的情况下编写了这篇文章，我很确定可以进行性能优化。首先，该集合通常实现为二叉搜索树。我将做一些研究，确定这是否是这个问题的最佳数据结构。此外，当向列表中插入一个新值时，可以给出一个关于它将被放置在何处的提示。如果我们聪明地选择了这个位置，那么我们就能加快这个行动。同样，和以前一样，当我们到达(a * b * c * d…)时，序列将重复，因此我们可以存储这些值，然后从那时开始将它们写出来。我还会研究问题空间，看看是否有优化算法的方法，可能会问一下math.stackexchange.com上的数学序列——这些都很清晰。

Anyways, this is just an option, it may or may not work for you depending on your real performance requirements.

无论如何，这只是一个选项，根据您的实际性能需求，它可能适合您，也可能不适合您。

Some other thoughts:

一些其他的想法:

How likely are you to get the same set of values (a,b,c,d...)? If this is likely, you may want to cache previous results. Then it would just be a matter of reading them from a cached array which would be very quick.
你有多大可能得到相同的值集(a,b,c,d…)?如果可能，您可能希望缓存以前的结果。然后，只需从缓存的数组中读取它们，这将非常快。
Another way to improve performance is to turn on compiler optimisations. How you do this and how effective it is depends on your compiler.
另一种提高性能的方法是打开编译器优化。如何做到这一点以及它的有效性取决于编译器。

Good luck.

祝你好运。

#2

Unclear why you're unhappy with the code you have. If it's because there are "so many " if tests, it's easy enough to do it with none:

不清楚为什么你对你的代码不满意。如果是因为有“那么多”的If测试，那么在没有测试的情况下就很容易做到:

def diffgen(a, b):
    from itertools import cycle
    diffs = []
    current = 0
    ab = a*b
    while current < ab:
        nextone = min((current // a + 1) * a,
                      (current // b + 1) * b)
        diffs.append(nextone - current)
        yield nextone - current
        current = nextone
    for d in cycle(diffs):
        yield d

Note that once you reach a*b, the sequence of diffs repeats, so no more calculations are needed then.

注意，一旦到达a*b，扩散序列就会重复，因此不需要再进行计算。

#3

Here's a way to do it using the modulo operation:

这里有一种使用模块化操作的方法:

a = 3
b = 5
current = 0

def nearest_multiple_of_a_or_b_to_current(current, a, b):
    distance_to_a = (a - current%a)
    distance_to_b = (b - current%b)
    return current + min(distance_to_a, distance_to_b)

for i in range(100):
    next = nearest_multiple_of_a_or_b_to_current(current, a, b)
    print(next - current)
    current = next

Output:

输出:

#1

让我们从一些一般的点开始。最好从你和你的同事都能理解的直觉代码开始。然后测量性能并找到瓶颈。如果你试着从一开始就超优化，你会:-

make code that is complicated, error prone and less understandable.
使代码变得复杂、容易出错、难以理解。
most likely optimise code that would barely register a blip on the overall performance, while overlooking major bottlenecks. Unless you know the processor, compiler, programming language and environmental nuances back to front there is a good chance you will make the performance worse if you try to guess the optimizations.
最可能优化的代码几乎不会影响整体性能，同时忽略了主要的瓶颈。除非您知道处理器、编译器、编程语言和环境的细微差别，否则如果您尝试猜测优化，很可能会使性能变得更差。

Status Quo (your code as-is)

现状(你的代码现状)

static void ConditionalTest( int startA, int startB, unsigned long long testIterations )
{       
    gl_result = 0;      
    gl_current=0;
    int distToA = startA;
    int distToB = startB;

    for( unsigned long long i = 0; i < testIterations; i++ )
    {           
        if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;
            gl_current += distToB;
            distToA -= distToB;
            distToB = startB;               
        }
        else if( distToB > distToA ) //A comes first
        {       
            gl_result = distToA;
            gl_current += distToA;
            distToB -= distToA;
            distToA = startA;                               
        }
        else
        {       
            gl_result = distToA; 
            gl_current += distToA; //Arbitrarily, could be distToB
            distToA = startA;
            distToB = startB;
        }
    }    
}

Note: -

注意:-

I assign the value to a global gl_result rather than printing it to save on filling up my console with messages and also because the operation of printing to screen takes ages compared to the other operations and so it would bloat the results.
我将值赋给一个全局gl_result，而不是将其打印出来，以节省在我的控制台中填充消息的时间，还因为与其他操作相比，打印到屏幕的操作要花费较长的时间，因此会导致结果膨胀。
I had to use unsigned long long for testIterations and some other variables because otherwise int would wrap around.
我必须使用无符号long来表示证明和其他变量，否则int将被包围。
gl_ are global variables in the test.
gl_是测试中的全局变量。

The benefit of this algorithm is that it uses very basic constructs, so

这个算法的好处是它使用了非常基本的结构

other programmers with even a very basic understanding of programming or from other programming languages will quickly understand what it is doing.
对编程或其他编程语言有基本了解的其他程序员很快就会明白它在做什么。
it is very portable - it is easy to translate to other languages and operating systems.
它是非常便携的-它很容易翻译到其他语言和操作系统。
regarding performance, is wysiwyg - what you see is what you get, so it is unlikely that there are big performance bottlenecks hidden in 3rd party library calls.
关于性能，wysiwyg——您所看到的是您所得到的，因此在第三方库调用中不太可能隐藏巨大的性能瓶颈。

A tweak to your implementation

实现的一个调整

static void ConditionalTestV2( int startA, int startB, unsigned long long testIterations )
{       
    gl_result = 0;      
    gl_current=0;
    int distToA = startA;
    int distToB = startB;

    for( unsigned long long i = 0; i < testIterations; i++ )
    {                       
        if( distToB > distToA ) //A comes first (where a is more likely to be first than b)
        {       
            gl_result = distToA;
            gl_current += distToA;
            distToB -= distToA;
            distToA = startA;                               
        }
        else if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;
            gl_current += distToB;
            distToA -= distToB;
            distToB = startB;               
        }
        else
        {       
            gl_result = distToB;   //Should be distToB for optimisations
            gl_current += distToB; //Should be distToB for optimisations
            distToA = startA;
            distToB = startB;
        }
    }      
}

The biggest lesson from this is to measure any change that you make for unintended consequences.

这其中最大的教训是，衡量你为意外后果所做的任何改变。

You may not always get the starting values in order, in which case you would need to way the cost of a one-off sort against the overall time of executing your algorithm.

您可能并不总是按顺序获得起始值，在这种情况下，您需要将一次性排序的成本与执行算法的总时间进行比较。

Some other things to point out are: -

还有一些需要指出的事情是:-。

You maintain a variable current, but you do not use it. If you are not using current, then you can remove it. You might not see a performance gain if the compiler already optimised it out.
你保持一个可变的电流，但你不使用它。如果您不使用电流，那么您可以删除它。如果编译器已经优化了性能，您可能看不到性能的提高。
You have a range of 100, but the cycle will repeat ever 3 * 5 = 15 times. So, you could either stop when current is 15 if that is all you need or you could store the results and then just write them out (see the patterns section).
你的范围是100，但是这个循环会重复3 * 5 = 15次。所以，当电流为15时，你可以停止，如果这是你所需要的，或者你可以存储结果，然后把它们写出来(见模式部分)。

Modulo

模

static void ModuloTest( int startA, int startB, unsigned long long testIterations )
{   
    unsigned long long current = 0;
    unsigned long long prev = 0;
    int distToA = startA;
    int distToB = startB;

    for( long long i = 0; i < testIterations; i++ )
    {       
        current += (gl_result = FastMin(distToA - (current%distToA), distToB - (current%distToB)));
    }
}

The result was 23349 milliseconds. Almost 10 times slower than your original.

结果是23349毫秒。几乎比原来慢了10倍。

static void PureModuloTest( unsigned long long testIterations, unsigned long long mod )
{
    for(long long i = 1; i <= testIterations; i++)
    {
        gl_result = mod % i;
    }
}

The next thing I did was looked at the data. Sometimes if you can find characteristics / patterns in the data you can optimise your implementation accordingly.

接下来我要做的就是查看数据。有时，如果您能够在数据中找到特征/模式，您就可以相应地优化实现。

Patterns

模式

使用这个事实的一种方法是将值存储到a * b，然后重用这些值直到完成。请注意，这本质上是使用算法开始，然后从那时开始遍历列表。

static void PatternTest( int startA, int startB, unsigned long long testIterations )
{
    int stop = startA * startB;
    list<int> resultList;
    int distToA = startA;
    int distToB = startB;   
    int val = 0;
    long long count = 0;
    while( val < stop  )
    {   
        if( distToB > distToA ) //A comes first (where a is more likely to be first than b)
        {       
            gl_result = distToA;                
            distToB -= distToA;
            distToA = startA;                               
        }
        else if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;                
            distToA -= distToB;
            distToB = startB;               
        }
        else
        {       
            gl_result = distToB;                
            distToA = startA;
            distToB = startB;
        }
        val += gl_result;
        resultList.push_back(gl_result);
        count++;
    }
    std::list<int>::const_iterator iter;
    while( count < testIterations )
    {
        for( iter = resultList.begin(); iter != resultList.end() && count < testIterations; iter++ )
        {
            gl_result = *iter;
            count++;
        }       
    }
}

Thread Bonanza!

线程财源滚滚!

But, we could do better and first get the values for the first a * b and then loop through the values on the separate threads. Which is what I have done here. I chose to run 4 threads.

但是，我们可以做得更好，首先获取第一个a * b的值，然后循环遍历独立线程上的值。这就是我在这里所做的。我选择运行4个线程。

struct BonanzaThreadInfo
{
    long long iterations;
    list<int> resultList;
    int result;
};

static void BonanzaTestThread( void* param )
{
    BonanzaThreadInfo* info = (BonanzaThreadInfo*)param;    

    std::list<int>::const_iterator iter;
    for( long long count = 0; count < info->iterations; )
    {
        for( iter = info->resultList.begin(); iter != info->resultList.end() && count < info->iterations; iter++ )
        {
            info->result = *iter;           
            count++;
        }   
    }
    delete param;
}

static void ThreadBonanzaTest( int startA, int startB, unsigned long long testIterations )
{   
    int stop = startA * startB;
    list<int> resultList;
    int distToA = startA;
    int distToB = startB;   
    int val = 0;
    long long count = 0;
    while( val < stop  )
    {       
        if( distToB > distToA ) //A comes first (where a is more likely to be first than b)
        {       
            gl_result = distToA;                
            distToB -= distToA;
            distToA = startA;                               
        }
        else if( distToA > distToB ) //B comes first
        {           
            gl_result = distToB;                
            distToA -= distToB;
            distToB = startB;               
        }
        else
        {       
            gl_result = distToB;                
            distToA = startA;
            distToB = startB;
        }
        val += gl_result;
        resultList.push_back(gl_result);
        count++;
    }
    long long threadIterations = (testIterations - count) / NUMTHREADS;
    long long iterationsLeft = testIterations-count;
    thread* bonanzaThreads = new thread[NUMTHREADS];
    for( int i = 0; i < NUMTHREADS; i++ )
    {
        BonanzaThreadInfo* bonanzaThreadInfo = new BonanzaThreadInfo;
        if( i == (NUMTHREADS - 1) )
        {
            bonanzaThreadInfo->iterations = iterationsLeft;
        }
        else
        {
            iterationsLeft -= threadIterations;
            bonanzaThreadInfo->iterations = (threadIterations);
        }       
        bonanzaThreadInfo->resultList = resultList;
        bonanzaThreads[i] = thread(BonanzaTestThread,bonanzaThreadInfo);//http://*.com/a/10662506/746754        
    }
    for( int i = 0; i < NUMTHREADS; i++ )
    {
        bonanzaThreads[i].join();
    }
    delete [] bonanzaThreads;
}

The result is that this took 574 milliseconds. A whopping saving of 76%! Some basic points to note about threads: -

结果是，这花费了574毫秒。节省了76%!关于线程要注意的一些基本要点:-

The complexity and room for error increases dramatically.
复杂性和出错的空间显著增加。
If there is any shared resource between the threads, then that resource will need to be protected by a mutex. If threads frequently need the same protected resource at the same time, then all threads that need that resource will need to wait until it is availabe which can result if very poor performance.
如果线程之间有任何共享资源，那么该资源将需要由互斥体保护。如果线程经常同时需要相同的受保护资源，那么所有需要该资源的线程都需要等到它可用时，如果性能非常差，就会导致这种情况。

Here is a graph of where we are up to so far:

这是我们到目前为止的图表:

计算两个不同数字的倍数之间的差值

Now, to your edit about multiple values.

现在，到你的编辑关于多个值。

Multiple Values

多个值

Well, as far as I can see, if you have multiple input values (a,b,c,d...) your if statements are going to become very nested and length very quickly. if a < b && a < c && a < d...

就我所见，如果你有多个输入值(a,b,c,d…)你的if语句将会变得非常嵌套并且长度很快。如果a < b && a < c & a < d…

struct ValuesStruct
{
public:
    int Value;
    long long Next;
    ValuesStruct( int start )
    {
        Value = start;
        Next = start;
    }
    bool operator < (const ValuesStruct& rOther) const
    {
        return (Next < rOther.Next);
    }
private:
    ValuesStruct()
    {

    }
};

static void MultiConditionalTest( multiset<ValuesStruct>& values, unsigned long long testIterations )
{               
    unsigned long long prev = 0;
    for( unsigned long long i = 0; i < testIterations; i++ )
    {
        multiset<ValuesStruct>::iterator iter = values.begin();     
        gl_result = (*(iter)).Next - prev;
        prev = (*(iter)).Next;
        do //handle case where equal
        {
            ValuesStruct valuesStruct = *iter;
            values.erase(iter);
            valuesStruct.Next += valuesStruct.Value;
            values.insert( valuesStruct );
            iter = values.begin();
        }while( (*iter).Next == prev );
    }
}

The function is used as follows:

其功能如下:

multiset<ValuesStruct> values;
values.insert(ValuesStruct(3));
values.insert(ValuesStruct(5));
values.insert(ValuesStruct(7));
values.insert(ValuesStruct(11));
MultiConditionalTest( values, testIterations );

Anyways, this is just an option, it may or may not work for you depending on your real performance requirements.

无论如何，这只是一个选项，根据您的实际性能需求，它可能适合您，也可能不适合您。

Some other thoughts:

一些其他的想法:

How likely are you to get the same set of values (a,b,c,d...)? If this is likely, you may want to cache previous results. Then it would just be a matter of reading them from a cached array which would be very quick.
你有多大可能得到相同的值集(a,b,c,d…)?如果可能，您可能希望缓存以前的结果。然后，只需从缓存的数组中读取它们，这将非常快。
Another way to improve performance is to turn on compiler optimisations. How you do this and how effective it is depends on your compiler.
另一种提高性能的方法是打开编译器优化。如何做到这一点以及它的有效性取决于编译器。

Good luck.

祝你好运。

#2

Unclear why you're unhappy with the code you have. If it's because there are "so many " if tests, it's easy enough to do it with none:

不清楚为什么你对你的代码不满意。如果是因为有“那么多”的If测试，那么在没有测试的情况下就很容易做到:

def diffgen(a, b):
    from itertools import cycle
    diffs = []
    current = 0
    ab = a*b
    while current < ab:
        nextone = min((current // a + 1) * a,
                      (current // b + 1) * b)
        diffs.append(nextone - current)
        yield nextone - current
        current = nextone
    for d in cycle(diffs):
        yield d

Note that once you reach a*b, the sequence of diffs repeats, so no more calculations are needed then.

注意，一旦到达a*b，扩散序列就会重复，因此不需要再进行计算。

#3

Here's a way to do it using the modulo operation:

这里有一种使用模块化操作的方法:

a = 3
b = 5
current = 0

def nearest_multiple_of_a_or_b_to_current(current, a, b):
    distance_to_a = (a - current%a)
    distance_to_b = (b - current%b)
    return current + min(distance_to_a, distance_to_b)

for i in range(100):
    next = nearest_multiple_of_a_or_b_to_current(current, a, b)
    print(next - current)
    current = next

Output:

输出:

秒客网

计算两个不同数字的倍数之间的差值

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章