使用Opengl FBO时,性能损失严重

时间:2022-09-17 21:47:18

I have successfully implemented a simple 2-d game using lwjgl (opengl) where objects fade away as they get further away from the player. This fading was initially implemented by computing distance to origin of each object from the player and using this to scale the objects alpha/opacity.

我已经成功地使用lwjgl (opengl)实现了一个简单的二维游戏,当对象离玩家越远时,它们就会消失。这种衰落最初是通过计算每个对象从播放器到原点的距离来实现的,并使用它来缩放对象alpha/不透明度。

However when using larger objects, this approach appears a bit too rough. My solution was to implement alpha/opacity scaling for every pixel in the object. Not only would this look better, but it would also move computation time from CPU to GPU.

然而,当使用较大的对象时,这种方法显得有些粗糙。我的解决方案是实现对象中每个像素的alpha/不透明度缩放。这不仅看起来更好,而且还将计算时间从CPU转移到GPU。

I figured I could implement it using an FBO and a temporary texture.
By drawing to the FBO and masking it with a precomputed distance map (a texture) using a special blend mode, I intended to achieve the effect. The algorithm is like so:

我想我可以使用一个FBO和一个临时纹理来实现它。通过绘制到FBO并使用一种特殊的混合模式用预先计算的距离映射(纹理)来掩盖它,我打算实现这个效果。算法是这样的:

0) Initialize opengl and setup FBO
1) Render background to standard buffer
2) Switch to custom FBO and clear it
3) Render objects (to FBO)
4) Mask FBO using distance-texture
5) Switch to standard buffer
6) Render FBO temporary texture (to standard buffer)
7) Render hud elements

0)初始化opengl并设置FBO 1)渲染背景到标准缓冲区2)切换到自定义FBO 3)渲染对象(到FBO) 4)使用距离纹理掩码FBO切换到标准缓冲区6)渲染FBO临时纹理(到标准缓冲区)7)渲染hud元素

A bit of extra info:

一些额外的信息:

  • The temporary texture has the same size as the window (and thus standard buffer)
  • 临时纹理具有与窗口相同的大小(因此是标准的缓冲区)
  • Step 4 uses a special blend mode to achieve the desired effect:
    GL11.glBlendFunc( GL11.GL_ZERO, GL11.GL_SRC_ALPHA );
  • 步骤4使用一种特殊的混合模式来实现期望的效果:GL11。glBlendFunc(GL11。GL_ZERO GL11。GL_SRC_ALPHA);
  • My temporary texture is created with min/mag filters: GL11.GL_NEAREST
  • 我的临时纹理是用最小/mag过滤器创建的:gl11 gl_nearest。
  • The data is allocated using: org.lwjgl.BufferUtils.createByteBuffer(4 * width * height);
  • 使用:org.lwjgl.BufferUtils分配数据。createByteBuffer(4 * width * height);
  • The texture is initialized using: GL11.glTexImage2D( GL11.GL_TEXTURE_2D, 0, GL11.GL_RGBA, width, height, 0, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, dataBuffer);
  • 使用:GL11初始化纹理。glTexImage2D(GL11。GL_TEXTURE_2D 0 GL11。GL_RGBA,宽度,高度,0,GL11。GL_RGBA GL11。GL_UNSIGNED_BYTE dataBuffer);
  • There are no GL errors in my code.
  • 我的代码中没有GL错误。

This does indeed achieve the desired results. However when I did a bit of performance testing I found that my FBO approach cripples performance. I tested by requesting 1000 successive renders and measuring the time. The results were as following:

这确实达到了预期的结果。然而,当我做了一些性能测试时,我发现我的FBO方法削弱了性能。我通过请求1000次连续渲染和测量时间进行测试。结果如下:

In 512x512 resolution:

在512年x512决议:

  • Normal: ~1.7s
  • 正常:~ 1.7 s
  • FBO: ~2.5s
  • 反馈:~ 2.5 s
  • (FBO -step 6: ~1.7s)
  • (反馈要6:~ 1.7 s)
  • (FBO -step 4: ~1.7s)
  • (反馈要4:~ 1.7 s)

In 1680x1050 resolution:

在1680 x1050决议:

  • Normal: ~1.7s
  • 正常:~ 1.7 s
  • FBO: ~7s
  • 反馈:~ 7 s
  • (FBO -step 6: ~3.5s)
  • (反馈要6:~ 3.5 s)
  • (FBO -step 4: ~6.0s)
  • (反馈要4:~ 6.0 s)

As you can see, this scales really badly. To make it even worse, I'm intending to do a second pass of this type. The machine I tested on is supposed to be high end in terms of my target audience, so I can expect people to have far below 60 fps with this approach, which is hardly acceptable for a game this simple.

正如你所看到的,这个比例真的很糟糕。更糟糕的是,我打算再做一次这种类型的传球。我测试过的机器应该是针对我的目标用户的高端产品,所以我可以预期使用这种方法的人的fps要远远低于60 fps,这对于这么简单的游戏来说是很难接受的。

What can I do to salvage my performance?

我能做些什么来挽救我的表演?

1 个解决方案

#1


4  

As suggested by Damon and sidewinderguy I successfully implemented a similar solution using a fragment shader (and vertex shader). My performance is little bit better than my initial cpu-run object-based computation, which is MUCH faster than my FBO-approach. At the same time it provides visual results much closer to the FBO-approach (Overlapping objects behave a bit different).

根据Damon和sidewinderguy的建议,我使用片段着色器(和顶点着色器)成功实现了类似的解决方案。我的性能比最初的基于cpu运行对象的计算好一点,这比我的fbo -方法快得多。与此同时,它提供的视觉结果更接近于FBO-approach(重叠对象的行为略有不同)。

For anyone interested the fragment shader basically transforms the gl_FragCoord.xy and does a texture lookup. I am not sure this gives the best performance, but with only 1 other texture activated I do not expect performance to increase by omitting the lookup and computing the texture value directly. Also, I now no longer have a performance bottleneck, so further optimizations should wait till it is found to be required.

对于任何有兴趣的碎片着色器基本上转换了gl_FragCoord。并进行纹理查找。我不确定这是否提供了最好的性能,但是如果只有一个其他的纹理被激活,我不希望通过忽略查找和直接计算纹理值来提高性能。此外,我现在不再有性能瓶颈,因此进一步的优化应该等到发现需要时才进行。

Also, I am very grateful for the all the help, suggestions and comments I received :-)

我也非常感谢所有的帮助,建议和评论我收到:

#1


4  

As suggested by Damon and sidewinderguy I successfully implemented a similar solution using a fragment shader (and vertex shader). My performance is little bit better than my initial cpu-run object-based computation, which is MUCH faster than my FBO-approach. At the same time it provides visual results much closer to the FBO-approach (Overlapping objects behave a bit different).

根据Damon和sidewinderguy的建议,我使用片段着色器(和顶点着色器)成功实现了类似的解决方案。我的性能比最初的基于cpu运行对象的计算好一点,这比我的fbo -方法快得多。与此同时,它提供的视觉结果更接近于FBO-approach(重叠对象的行为略有不同)。

For anyone interested the fragment shader basically transforms the gl_FragCoord.xy and does a texture lookup. I am not sure this gives the best performance, but with only 1 other texture activated I do not expect performance to increase by omitting the lookup and computing the texture value directly. Also, I now no longer have a performance bottleneck, so further optimizations should wait till it is found to be required.

对于任何有兴趣的碎片着色器基本上转换了gl_FragCoord。并进行纹理查找。我不确定这是否提供了最好的性能,但是如果只有一个其他的纹理被激活,我不希望通过忽略查找和直接计算纹理值来提高性能。此外,我现在不再有性能瓶颈,因此进一步的优化应该等到发现需要时才进行。

Also, I am very grateful for the all the help, suggestions and comments I received :-)

我也非常感谢所有的帮助,建议和评论我收到: