Persistent buffer synchronization doesn't work.

SasMaster · October 25, 2018, 1:33pm

My hardware is:

Quadro P2000 (notebook)

Driver version 411.63 Using OpenGL 4.5 (core)

We are developing a graphics engine.Till now we mapped buffer range to upload data to GPU.
Now we decided to try persistent mapping.Important to note that in our use case we must sync the data upload
before each next frame rendering. After initial tests we detected issue with the sync.
We tried GL_MAP_COHERENT_BIT and also glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT);. It doesn’t work…

To verify this is not problem with our usage of
the API we reverted back to regular range mapping - it worked.And also tried to force the GPU to accomplish all the commands with read out Frame buffer to CPU - it also fixed the issue.
Now the only direction we have is - the current version of the driver has a bug.
Here is what we do in the code:

glCreateBuffers(1, &buffer->handle);
        glNamedBufferStorage(buffer->handle, buffer->size, data,  GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT | GL_MAP_COHERENT_BIT);
        gpuPointer = glMapNamedBufferRange(buffer->handle, offset, buffer->size,  GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT | GL_MAP_COHERENT_BIT);
       glBindBufferBase(buffer->target, index, buffer->handle);

      //Then at some point later during execution we grab gpuPointer and update the data.

This specific buffer is responsible to hold MVP matrix array.And we can see the output with sporadically wrong transformation of different objects which is another indication that the issue with the sync.

pdaniell · October 26, 2018, 5:16pm

As an experiment, can you try calling glFlushMappedBufferRange after writing to the persistently mapped buffer and before the GL command that triggers a read from that buffer. You’ll need to add the GL_MAP_FLUSH_EXPLICIT_BIT to the glMapBuffer call. I wonder if the value is stuck in a cache. You shouldn’t need to do this for coherently mapped buffers, but it will help isolate the problem. Thanks.

SasMaster · October 27, 2018, 10:05am

Hi.Yeah,I tried that one with :

glFlushMappedNamedBufferRange()

In this case I create the buffer with the flags:

GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT

And map the persistent pointer with:

GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_FLUSH_EXPLICIT_BIT

Same thing.

pdaniell · October 29, 2018, 6:09pm

Would you be able to provide a simple repro application showing this issue? We will use it to investigate the issue. Thanks.

SasMaster · November 4, 2018, 3:18pm

THIS IS NOT DRIVER BUG but a synchronization problem with my code.I will explain with the help of the following pseudo-code:

begin rendering frame
   for each draw batch{

      update transforms UBO for the current batch (persistent pointer)  <---- here is the problem
      call to GL_MAP_FLUSH_EXPLICIT_BIT after update
      for each draw command in the current batch{
             issue draw call
      }

   }
   end rendering frame

   call to swap buffer, download frame to gpu or glFinish()...

What happens is this, I have a render loop which performs a render of several groups (batches) of objects
one after another (see inner loop).I was trying to upload array of transform data for that batch to GPU using persistent buffer,as a one step before that batch gets rendering. I FORGOT that the draw call commands are pipelined in async manner to the GPU.So I mistakenly thought that using GL_MAP_COHERENT_BIT or calling GL_MAP_FLUSH_EXPLICIT_BIT right after UBO update would guarantee every following draw batch seeing its freshly update transform data. But that is not the case, unless one performs sync after the draw command batch is done to make sure all the submitted render commands till that pointer are processed. So what happens if we don’t do that? From my use case the following happens given the scenario above:

1.First batch UBO update.
2.First batch rendering begins 3. Second batch UBO update
-----------somewhere here the first batch rendering commands are done by GPU
4.Second batch rendering begins

See? I am starting updating the second batch UBO while the first rendering commands are still processed,and by doing that I effectively overwrite the UBO data which is currently being accessed by the vertex shader of the draw calls in the pipeline.

The solution:

A full pipeline sync can be done after each batch rendering loop is done.(not good for performance)
The render loop can be re-designed so that transforms for all the batches are uploaded into UBO at the start of every frame,so that a less cruel sync can be done only at the end of every frame allowing better pipelining of the rendering commands during single frame.

Thanks to @pdaniell for point out to the mistake.

Topic		Replies	Views
UBO Performance OpenGL	7	9323	September 26, 2014
glBufferStorage with persistent mapping OpenGL	1	3018	April 14, 2014
Do persistent+coherent write buffers always end up in system memory? OpenGL	0	627	February 13, 2021
Problem with cudaGLMapBufferObject CUDA Programming and Performance	4	6159	February 29, 2008
Need help OpenGL gl_NV_command_list extensions OpenGL	14	894	July 6, 2023
OpenGL 4.4 very slow - OpenGL 1.1 very fast - Performance Problem Quadro K4200/K2000 OpenGL	1	3161	January 26, 2016
Framebuffer incomplete when attaching color buffers of different sizes with DSA Drivers - Linux, Windows, MacOS nvbugs , opengl , driver	5	2151	September 13, 2022
Why do I not have to use glFlushMappedBufferRange to make write to persistent mapping visible to OpenGL? OpenGL	1	1595	September 27, 2023
glMapBufferRange overhead OpenGL	4	8972	July 23, 2013
Strange error with FBO depth buffers OpenGL	0	2020	May 28, 2014

Persistent buffer synchronization doesn't work.

Related topics