Persistent buffer synchronization doesn't work.

My hardware is:

Quadro P2000 (notebook)

Driver version 411.63 Using OpenGL 4.5 (core)

We are developing a graphics engine.Till now we mapped buffer range to upload data to GPU.
Now we decided to try persistent mapping.Important to note that in our use case we must sync the data upload
before each next frame rendering. After initial tests we detected issue with the sync.
We tried GL_MAP_COHERENT_BIT and also glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT);. It doesn’t work…

To verify this is not problem with our usage of
the API we reverted back to regular range mapping - it worked.And also tried to force the GPU to accomplish all the commands with read out Frame buffer to CPU - it also fixed the issue.
Now the only direction we have is - the current version of the driver has a bug.
Here is what we do in the code:

glCreateBuffers(1, &buffer->handle);
        glNamedBufferStorage(buffer->handle, buffer->size, data,  GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT | GL_MAP_COHERENT_BIT);
        gpuPointer = glMapNamedBufferRange(buffer->handle, offset, buffer->size,  GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT | GL_MAP_COHERENT_BIT);
       glBindBufferBase(buffer->target, index, buffer->handle);

      //Then at some point later during execution we grab gpuPointer and update the data.

This specific buffer is responsible to hold MVP matrix array.And we can see the output with sporadically wrong transformation of different objects which is another indication that the issue with the sync.

As an experiment, can you try calling glFlushMappedBufferRange after writing to the persistently mapped buffer and before the GL command that triggers a read from that buffer. You’ll need to add the GL_MAP_FLUSH_EXPLICIT_BIT to the glMapBuffer call. I wonder if the value is stuck in a cache. You shouldn’t need to do this for coherently mapped buffers, but it will help isolate the problem. Thanks.

Hi.Yeah,I tried that one with :


In this case I create the buffer with the flags:


And map the persistent pointer with:


Same thing.

Would you be able to provide a simple repro application showing this issue? We will use it to investigate the issue. Thanks.

THIS IS NOT DRIVER BUG but a synchronization problem with my code.I will explain with the help of the following pseudo-code:

begin rendering frame
   for each draw batch{

      update transforms UBO for the current batch (persistent pointer)  <---- here is the problem
      call to GL_MAP_FLUSH_EXPLICIT_BIT after update
      for each draw command in the current batch{
             issue draw call

   end rendering frame

   call to swap buffer, download frame to gpu or glFinish()...

What happens is this, I have a render loop which performs a render of several groups (batches) of objects
one after another (see inner loop).I was trying to upload array of transform data for that batch to GPU using persistent buffer,as a one step before that batch gets rendering. I FORGOT that the draw call commands are pipelined in async manner to the GPU.So I mistakenly thought that using GL_MAP_COHERENT_BIT or calling GL_MAP_FLUSH_EXPLICIT_BIT right after UBO update would guarantee every following draw batch seeing its freshly update transform data. But that is not the case, unless one performs sync after the draw command batch is done to make sure all the submitted render commands till that pointer are processed. So what happens if we don’t do that? From my use case the following happens given the scenario above:

1.First batch UBO update.
2.First batch rendering begins 3. Second batch UBO update
-----------somewhere here the first batch rendering commands are done by GPU
4.Second batch rendering begins

See? I am starting updating the second batch UBO while the first rendering commands are still processed,and by doing that I effectively overwrite the UBO data which is currently being accessed by the vertex shader of the draw calls in the pipeline.

The solution:

  1. A full pipeline sync can be done after each batch rendering loop is done.(not good for performance)
  2. The render loop can be re-designed so that transforms for all the batches are uploaded into UBO at the start of every frame,so that a less cruel sync can be done only at the end of every frame allowing better pipelining of the rendering commands during single frame.

Thanks to @pdaniell for point out to the mistake.