小重复算术与创建新变量

时间:2022-07-02 01:37:36

I am writing in C++ for a low-spec device (~3MB RAM, ~70MHz CPU) and I am wondering which will run more efficiently (And by how much). This is a simplified segment of code that will run 120-600 times a second:

我正在用C ++编写一个低规格的设备(~3MB RAM,~70MHz CPU),我想知道哪个将更有效地运行(以及多少)。这是一段简化的代码段,每秒运行120-600次:

void checkCollisions(int x, int y)
{
    int p1x = x, p1y = y+2;
    int p2x = x, p2y = y+3;
    int p3x = x+3, p3y = y+3;
    // And so on...

    if (wallAt(p1x-1, p1y) || wallAt(p2x-1, p2y))
        setCollision(LEFT, true);
    if (wallAt(p1x, p1y) || wallAt(p4x, p4y) || wallAt(p5x, p5y))
        inGround = true;
    else
        inGround = false;
    // And so on...
}

Or replacing the integers with their definitions:

或者用他们的定义替换整数:

void checkCollisionsAlt(int x, int y)
{
    if (wallAt(x-1, y+2) || wallAt(x-1, y+3))
        setCollision(LEFT, true);
    if (wallAt(x, y+2) || wallAt(x+3, y) || wallAt(x+2, y))
        inGround = true;
    else
        inGround = false;
    // And so on...
}

Here's a diagram of the example:

这是一个示例图:

小重复算术与创建新变量

The first one is more understandable, but I would expect uses more memory. How much of a difference does it make?

第一个更容易理解,但我希望使用更多的内存。它有多大区别?

3 个解决方案

#1


A few points to think about:

要考虑几点:

  1. If the full version also doesn't have any recursion, you can worry less about the variables(p1x etc.) on the stack.
  2. 如果完整版本也没有任何递归,您可以更少担心堆栈上的变量(p1x等)。

  3. Stack-consumption is ephemeral and shouldn't hit you unless you have pathological code, such as deep recursion with each frame being heavy.
  4. 堆栈消耗是短暂的,除非你有病态代码,否则不应该打击你,例如每个帧都很重的深度递归。

  5. Recursion usually is a bad idea with a tight memory budget.
  6. 递归通常是一个坏主意,内存预算紧张。

  7. The fact that you have them as explicitly named variables, does not mean they will be so at the execution time.
  8. 将它们作为显式命名变量的事实并不意味着它们在执行时会如此。

  9. Any decent compiler is likely to recognize the life-time of the variables and push them to registers. Please verify this with the compiler optimization level that you are currently using and consider bumping it up, if needed.
  10. 任何体面的编译器都可能识别变量的生命周期并将它们推送到寄存器。请使用您当前使用的编译器优化级别对此进行验证,并在需要时考虑将其更新。

Also, what is the expected range of these values of p1x etc. Why not use short int?

另外,这些p1x等值的预期范围是什么?为什么不使用short int?

Ephemeral nature of the stack memory growth means that your peak heap memory is not impacted. Stack can grow and shrink and depending on the lay out and amount budgeted for stack, you might not have to worry about this at all.

堆栈内存增长的短暂性质意味着您的峰值堆内存不会受到影响。堆栈可以增长和缩小,并且根据堆栈的预算和预算金额,您可能根本不必担心这一点。

Note: Any and ALL heap allocations need to be carefully vetted. Try implementing a custom allocator, rather than incurring the standard malloc() chunk overheads. Of course, you didn't bring up the heap in the question, but, just a thing to keep in mind.

注意:需要仔细审查任何和所有堆分配。尝试实现自定义分配器,而不是产生标准的malloc()块开销。当然,你没有在问题中提出问题,但是,请记住这一点。

#2


What will make your code faster if the compiler worked as you want is to put all your variable into registers. I think that any modern compiler will understand from itself that your 2 code versions are the same and will give the same - or very similar - output. It will try to utilize the core register in both cases and will use the memory - stack in this case - only if no sufficient number of registers is available. If the compiler give you the option to keep the intermediate assembly files do that and you can make a deep understand of your codes and the performance. Remember that the low memory access - registers used instead - will enhance your code performance.

如果编译器按您的意愿工作,那么将使代码更快的是将所有变量放入寄存器。我认为任何现代编译器都会从中了解到你的2个代码版本是相同的,并且会给出相同或非常相似的输出。它将尝试在两种情况下都使用核心寄存器,并且在这种情况下将使用存储器堆栈 - 只有在没有足够数量的寄存器可用时。如果编译器为您提供了保持中间汇编文件执行此操作的选项,则可以深入了解代码和性能。请记住,低内存访问 - 使用的寄存器 - 将提高代码性能。

#3


If the machine runs at 70 MHz, that means it has 117,000 cycles every 600th of a second.

如果机器以70 MHz运行,这意味着它每600秒有117,000个周期。

If instructions take an average of 10 cycles each, it can execute 11,700 instructions in a 600th of a second.

如果指令每个平均需要10个周期,它可以在600秒内执行11,700个指令。

As I look at your code, either one, I'm guesstimating about 100 instructions to execute it. 100/11,700 = roughly 1% of time spent running this code.

当我查看你的代码时,我估计有大约100个指令来执行它。 100 / 11,700 =运行此代码所用时间的大约1%。

You could step through it at the assembly language level to see how many instructions it takes, but it probably won't make much difference.

您可以在汇编语言级别单步执行它以查看它需要多少指令,但它可能没有太大的区别。

I suspect you have bigger fish to fry elsewhere.

我怀疑你有更大的鱼可以在其他地方煎炸。

#1


A few points to think about:

要考虑几点:

  1. If the full version also doesn't have any recursion, you can worry less about the variables(p1x etc.) on the stack.
  2. 如果完整版本也没有任何递归,您可以更少担心堆栈上的变量(p1x等)。

  3. Stack-consumption is ephemeral and shouldn't hit you unless you have pathological code, such as deep recursion with each frame being heavy.
  4. 堆栈消耗是短暂的,除非你有病态代码,否则不应该打击你,例如每个帧都很重的深度递归。

  5. Recursion usually is a bad idea with a tight memory budget.
  6. 递归通常是一个坏主意,内存预算紧张。

  7. The fact that you have them as explicitly named variables, does not mean they will be so at the execution time.
  8. 将它们作为显式命名变量的事实并不意味着它们在执行时会如此。

  9. Any decent compiler is likely to recognize the life-time of the variables and push them to registers. Please verify this with the compiler optimization level that you are currently using and consider bumping it up, if needed.
  10. 任何体面的编译器都可能识别变量的生命周期并将它们推送到寄存器。请使用您当前使用的编译器优化级别对此进行验证,并在需要时考虑将其更新。

Also, what is the expected range of these values of p1x etc. Why not use short int?

另外,这些p1x等值的预期范围是什么?为什么不使用short int?

Ephemeral nature of the stack memory growth means that your peak heap memory is not impacted. Stack can grow and shrink and depending on the lay out and amount budgeted for stack, you might not have to worry about this at all.

堆栈内存增长的短暂性质意味着您的峰值堆内存不会受到影响。堆栈可以增长和缩小,并且根据堆栈的预算和预算金额,您可能根本不必担心这一点。

Note: Any and ALL heap allocations need to be carefully vetted. Try implementing a custom allocator, rather than incurring the standard malloc() chunk overheads. Of course, you didn't bring up the heap in the question, but, just a thing to keep in mind.

注意:需要仔细审查任何和所有堆分配。尝试实现自定义分配器,而不是产生标准的malloc()块开销。当然,你没有在问题中提出问题,但是,请记住这一点。

#2


What will make your code faster if the compiler worked as you want is to put all your variable into registers. I think that any modern compiler will understand from itself that your 2 code versions are the same and will give the same - or very similar - output. It will try to utilize the core register in both cases and will use the memory - stack in this case - only if no sufficient number of registers is available. If the compiler give you the option to keep the intermediate assembly files do that and you can make a deep understand of your codes and the performance. Remember that the low memory access - registers used instead - will enhance your code performance.

如果编译器按您的意愿工作,那么将使代码更快的是将所有变量放入寄存器。我认为任何现代编译器都会从中了解到你的2个代码版本是相同的,并且会给出相同或非常相似的输出。它将尝试在两种情况下都使用核心寄存器,并且在这种情况下将使用存储器堆栈 - 只有在没有足够数量的寄存器可用时。如果编译器为您提供了保持中间汇编文件执行此操作的选项,则可以深入了解代码和性能。请记住,低内存访问 - 使用的寄存器 - 将提高代码性能。

#3


If the machine runs at 70 MHz, that means it has 117,000 cycles every 600th of a second.

如果机器以70 MHz运行,这意味着它每600秒有117,000个周期。

If instructions take an average of 10 cycles each, it can execute 11,700 instructions in a 600th of a second.

如果指令每个平均需要10个周期,它可以在600秒内执行11,700个指令。

As I look at your code, either one, I'm guesstimating about 100 instructions to execute it. 100/11,700 = roughly 1% of time spent running this code.

当我查看你的代码时,我估计有大约100个指令来执行它。 100 / 11,700 =运行此代码所用时间的大约1%。

You could step through it at the assembly language level to see how many instructions it takes, but it probably won't make much difference.

您可以在汇编语言级别单步执行它以查看它需要多少指令,但它可能没有太大的区别。

I suspect you have bigger fish to fry elsewhere.

我怀疑你有更大的鱼可以在其他地方煎炸。