I am trying to optimize may raycaster and found a couple of difficulties - I hope somebody can give me a good advice.
I seems the driver always stalls making CUDA calls. I always get framerates like 60,30,20,15,12,… but there is nothing in between, it basically snaps.
(Its not the VSync - I already switched that off)
I guess the problem is that I was not using the Asynchronous versions with events.
I therefore modified everything and found another problem which I guess could be a driver bug or undocumented problem…
If I want to use a mapped PBO along with the async code, the GPU seems to be doing nothing.
Here what happens without PBO’s:
... asyncAPI_test() ...
time spent executing by the GPU: 181.57
time spent by CPU in CUDA calls: 0.11
CPU executed 354 iterations while waiting for GPU to finish
Here what happens with PBO’s:
... CUDA_SAFE_CALL(cudaGLMapBufferObject( (void**)&out_data, pbo_out)); asyncAPI_test() CUDA_SAFE_CALL(cudaGLUnmapBufferObject( pbo_out)); ...
time spent executing by the GPU: 0.00
time spent by CPU in CUDA calls: 2.19
CPU executed 0 iterations while waiting for GPU to finish
any help is appreciated…