UBO Performance

Snuffalufagus · June 10, 2014, 3:44pm

No matter what I do, I can’t get uniform buffer objects to perform better than regular glUniform* calls. Is UBO performance a known issue, or could you provide a “best practice” for using them?

busta78 · June 10, 2014, 6:06pm

…Could you give a little more detail about your use case?
-update frequency.
-how many shaders are sharing the UBO.
-UBO size
etc.

droettger · June 11, 2014, 7:06am

You might want to look at this presentation from the last NVIDIA GTC for best practices and performance comparisons of rendering methods and parameter updates:
[url]http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering-techniques.pdf[/url]

Related topic from the year before:
[url]http://on-demand.gputechconf.com/gtc/2013/presentations/S3032-Advanced-Scenegraph-Rendering-Pipeline.pdf[/url]

There should also be recordings of these presentations on
[url]http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php[/url]

Snuffalufagus · June 23, 2014, 2:57pm

I’m doing what the presentations suggest, but the performance is still worse than with regular glUniform calls. Other people have the same problem (google “ubo performance”). I split the uniforms into two uniform blocks. One block contains uniforms that are frequently updated (matrices and whatnot) and the other contains static uniforms updated just once per frame. I use glNamedBufferSubDataEXT to upload the data. I’ve also tried several other variants.

The application is a military flight simulator. We’re currently CPU bound. We’re using direct state access, bindless, streaming and interleaved VBOs. For my test scene, there are about 40 unique materials (array textures to reduce material counts). This is linux so I’m running 331.79 on a 780.

droettger · June 23, 2014, 3:59pm

Note that the performance comparisons of the first linked presentation have been between different, newer(!) drivers than you’re using. Please check again with the next upcoming driver generations. E.g. beta versions of 340.xx are already available.

If that doesn’t help getting an improvement on the parameter update performance, some more analysis of the bottleneck would be required.

Snuffalufagus · June 27, 2014, 3:49pm

I installed the 340.17 driver. Here are my timings. I’m using the default shared layout. This is with a 780 and a core i7 960. I’ll have to try this with a more modern CPU at some point.

9.5 ms all uniforms in a uniform block
8.75 ms static uniforms (updated at most once per frame) in uniform block
8.25 ms no uniform blocks

Dutta · September 10, 2014, 12:15pm

You can also try to implement your uniform buffers using coherent and persistent mapping. I had the very same issue, glUniform was way faster than any buffering call. When I switched to using persistent mapped coherent buffers and syncing myself, I got a big performance boost.

It should also be noted that in order to make uniform buffers as efficient as possible, you should consider buffering them so that you can write to a section of the buffer which isn’t in use, and as such avoid stalling.

However, I should note that not everything in my project is using uniform buffers, only the stuff which needs either extremely frequent updating (guaranteed per object stuff like transform matrices) or not so frequent updating (per frame stuff like view matrices and such).

droettger · September 26, 2014, 8:07am

We were able to open-source our work which lead to the above GTC presentation results now.
Please have a look at https://devtalk.nvidia.com/default/topic/777618/scenix/announcing-nvpro-pipeline-a-research-rendering-pipeline/

That should allow you to investigate the different options available to pass parameters to GLSL shader programs and possibly overcome your current UBO bottlenecks.
Please mind that all results presented have been on Quadro boards and also rely on improvements inside the OpenGL driver itself, so use the newest ones available when benchmarking.

Topic		Replies	Views
glDrawCommandsNV and uniform buffers OpenGL	4	1267	January 17, 2018
Persistent buffer synchronization doesn't work. OpenGL	4	2335	November 4, 2018
314 drivers breaking changes OpenGL	3	1373	July 10, 2013
Enlarge UBO size OpenGL	1	928	February 22, 2015
cuda 3: cudaGraphicsMapResources performance issue linux 32bit, driver 195.30, macbookpro nvidia 960 CUDA Programming and Performance	3	4058	March 19, 2010
OpenGL 4.4 very slow - OpenGL 1.1 very fast - Performance Problem Quadro K4200/K2000 OpenGL	1	3165	January 26, 2016
Buffer performance warning GL_PIXEL_UNPACK_BUFFER_ARB when uploading to depth texture OpenGL	1	30	March 27, 2025
display a buffer openGL/cuda question CUDA Programming and Performance	11	8179	May 13, 2008
Driver issue with bindless textures pixels are flickering on the window OpenGL	3	887	October 12, 2021
Strange poor performances using glNamedBufferSubData OpenGL	0	1268	June 21, 2018

UBO Performance

Related topics