Blending onto RGBA16 FBO doesn't clamp to 1, wrapping to 0 instead

I’m implementing a basic accumulation buffer using two FBO’s, one for rendering and the other for accumulation. The rendering FBO is normal RGBA, and the accumulation buffer is RGBA16. I’m blending (ONE, ONE) to add to the accumulation FBO.

Things work well until a channel is saturated. At that point it goes back to 0 instead of being clamped to 1 (as is defined in the spec and as happened on an AMD card I tested this code on).

I’m using a GeForce 9600 GT on Windows 7 64-bit with driver 326.80.

Thanks for the report. This could be reproduced by the OpenGL driver team and is filed as a bugreport now. (This problem was not reproducible on Fermi or Kepler based GPUs.)

Unfortunately blending performance on signed or unsigned RGBA16 buffers won’t be fast on the 9600GT.
I think the G94 GPU on that supports 16-bit floating point blending. Depending on the precision requirements you have, you might want to try the GL_RGBA16F format instead which won’t clamp because it’s floating point.

Thanks a lot! I’m happy to hear that it’s been reproduced. Any way for me to know if it’s fixed?

For now I settled on GL_RGBA32F, which works fine. The application isn’t realtime, and just moving accumulation from CPU to GPU would help performance.

I’ll track it and try to post here when it’s known which driver receives the fix.

Again, it’s unlikely that you will see a speedup on the GF9600GT when blending on signed or unsigned RGBA16 (or signed RGBA8) targets. That operation is not supported in hardware on that 5.5 years old GPU. If you plan to benefit from GPU acceleration with that operation I’d recommend to upgrade to a more recent GPU architecture.
On your current workaround, if you can live with the lower precision I would also recommend to favor RGBA16F over RGBA32F for performance when blending.

The program runs over 10 times faster with on-GPU blending using RGBA32F vs. using glReadPixels (which also doesn’t produce correct results), so it’s definitely an improvement. (That’s a couple of minutes compared to tens of minutes.)

As for the hardware of choice, it’s clear that newer is better, but the older the hardware I can work with the better. I had a tester using a GeForce 7600 (I think, maybe another 7 family chip), and that likely won’t work with RGBA32F, but might with RGBA16 (I remember blending RGBA16 was a selling point for these old GPU’s).