Problem to use Asynchronous execution with PBO

I am trying to optimize may raycaster and found a couple of difficulties - I hope somebody can give me a good advice.

I seems the driver always stalls making CUDA calls. I always get framerates like 60,30,20,15,12,… but there is nothing in between, it basically snaps.

(Its not the VSync - I already switched that off)

I guess the problem is that I was not using the Asynchronous versions with events.

I therefore modified everything and found another problem which I guess could be a driver bug or undocumented problem…

If I want to use a mapped PBO along with the async code, the GPU seems to be doing nothing.

Here what happens without PBO’s:





time spent executing by the GPU: 181.57

time spent by CPU in CUDA calls: 0.11

CPU executed 354 iterations while waiting for GPU to finish


Here what happens with PBO’s:


CUDA_SAFE_CALL(cudaGLMapBufferObject( (void**)&out_data, pbo_out));   


CUDA_SAFE_CALL(cudaGLUnmapBufferObject( pbo_out));




time spent executing by the GPU: 0.00

time spent by CPU in CUDA calls: 2.19

CPU executed 0 iterations while waiting for GPU to finish

any help is appreciated…


I do know that OpenGL calls always cause an implicit synchronisation when switching context (like when using CUDA), so using async calls together with OpenGL interoperability is only of limited use.

Um… does it means it is impossible ?

Or is there any other possiblity to change the driver’s synchronization behavior?

It seems that there is an internal waiting loop synchronized to 60 fps…

I just want to get optimal performance.

Well I mean that you can do it, it just doesn’t help performance.

But the 60fps number is strange, CUDA shouldn’t lock it to a certain number of frames per second. I’ve had much higher fps rates with CUDA-using opengl programs.

Hm… its not just my program. In the SDK examples there is already a fluid demo that is showing the FPS. In my case its always 60.8 even I reduce the number of particles to a fraction of the original amount.

The strange thing is now: Once I start dragging the console window accross the GL-Window, the framerate increases from 60.8 up to 97 fps … ???

The only thing I can guess is that the synchronization is done event based and each times an event (like whatever) occurs, the synchronization checks if the GPU is done…

I just found out that in my case, already keeping a key pushed leads to a higher framerate … Its really an event-problem.

Is it possible to generate fake-key events e.g. in a timer loop ?

What event system are you using?

In windows you usually do these things in the Idle event. I seem to recall there being a return value that controlled how long it will be until the next idle event.