Problem to use Asynchronous execution with PBO

spacerat · March 4, 2008, 11:33am

I am trying to optimize may raycaster and found a couple of difficulties - I hope somebody can give me a good advice.

I seems the driver always stalls making CUDA calls. I always get framerates like 60,30,20,15,12,… but there is nothing in between, it basically snaps.

(Its not the VSync - I already switched that off)

I guess the problem is that I was not using the Asynchronous versions with events.

I therefore modified everything and found another problem which I guess could be a driver bug or undocumented problem…

If I want to use a mapped PBO along with the async code, the GPU seems to be doing nothing.

Here what happens without PBO’s:

...

asyncAPI_test()

...

Result:OK

time spent executing by the GPU: 181.57

time spent by CPU in CUDA calls: 0.11

CPU executed 354 iterations while waiting for GPU to finish

Test PASSED

Here what happens with PBO’s:

...

CUDA_SAFE_CALL(cudaGLMapBufferObject( (void**)&out_data, pbo_out));   

asyncAPI_test()

CUDA_SAFE_CALL(cudaGLUnmapBufferObject( pbo_out));

...

Result:Failed

Test FAILED

time spent executing by the GPU: 0.00

time spent by CPU in CUDA calls: 2.19

CPU executed 0 iterations while waiting for GPU to finish

any help is appreciated…

regards

wumpus · March 4, 2008, 11:54am

I do know that OpenGL calls always cause an implicit synchronisation when switching context (like when using CUDA), so using async calls together with OpenGL interoperability is only of limited use.

spacerat · March 4, 2008, 11:59am

Um… does it means it is impossible ?

Or is there any other possiblity to change the driver’s synchronization behavior?

It seems that there is an internal waiting loop synchronized to 60 fps…

I just want to get optimal performance.

wumpus · March 4, 2008, 12:09pm

Well I mean that you can do it, it just doesn’t help performance.

But the 60fps number is strange, CUDA shouldn’t lock it to a certain number of frames per second. I’ve had much higher fps rates with CUDA-using opengl programs.

spacerat · March 4, 2008, 12:43pm

Hm… its not just my program. In the SDK examples there is already a fluid demo that is showing the FPS. In my case its always 60.8 even I reduce the number of particles to a fraction of the original amount.

The strange thing is now: Once I start dragging the console window accross the GL-Window, the framerate increases from 60.8 up to 97 fps … ???

The only thing I can guess is that the synchronization is done event based and each times an event (like whatever) occurs, the synchronization checks if the GPU is done…

spacerat · March 4, 2008, 1:06pm

I just found out that in my case, already keeping a key pushed leads to a higher framerate … Its really an event-problem.

Is it possible to generate fake-key events e.g. in a timer loop ?

MisterAnderson42 · March 4, 2008, 2:58pm

What event system are you using?

In windows you usually do these things in the Idle event. I seem to recall there being a return value that controlled how long it will be until the next idle event.

Topic		Replies	Views
cudaGLMapBufferObject exception CUDA Programming and Performance	0	1554	March 18, 2010
OpenGL Performance Problem Mapping to Pbo Decreases Performance CUDA Programming and Performance	0	1518	January 10, 2008
Async video encoding using NvEnc / cuda from a GL PBO Video Processing & Optical Flow	0	945	November 28, 2017
Asynchronous performance between CPU and GPU CUDA Programming and Performance	3	2473	June 18, 2012
A problem of CUDA & OpenGL interoperation CUDA Programming and Performance	4	4005	May 17, 2009
Draw PBO into the screen : performance OpenGL	10	5294	June 21, 2013
cudaGLMapBufferObject (and unmap) performance These calls take way too long CUDA Programming and Performance	47	76675	February 14, 2010
Cuda, PBO, is it safe? Cuda && PBO problem sometimes CUDA Programming and Performance	0	2449	February 18, 2008
Does CUDA<->GL interop cause implicit sync? Mapping/unmapping buffers to GL CUDA Programming and Performance	3	2667	December 27, 2007
Inefficient CUDA and OpenGL Interop CUDA Programming and Performance	4	2369	December 5, 2012

Problem to use Asynchronous execution with PBO

Related topics