hi cuda users
i have a deferred shading setup with the following steps:
-
render (GLSL) to FBO with multiple renderbuffers attached
-
transfer some of the renderbuffers via PBO/TBO and cudaGraphicsMapResources to CUDA
-
process pixels with CUDA
-
transfer back to texture using PBO, also map the non-cuda-processed pbo to texture
-
render result (GLSL) using TBOs and usual textures
my problem is that cudaGraphicsMapResources in step 2 takes 20ms (macbookpro nvidia 9600M) to map 4 PBOs (2 read/ 2 write) as cuda pointers. i expect this number to be ~1ms or less… is my expectation wrong?
related question: i read in the programming guide for 3.0 that renderbuffers can be mapped directly with cuda (avoid pbo) using cudaGraphicsGLRegisterBuffer but i cannot get it to work. does anybody have some example code for that (RBO → cuda texture)???
note: i use a cpu timer and do a cudaThreadSynchronize before/after each start/stop of the timer. i render with 1024x768 resolution
here are the relevant code sections, first initialisation:
btw: my code is inspired from this nice blog post: http://www.rauwendaal.net/blog/howtousecud…tyapiwithopengl
and the render loop:
i appreciate any help!
kind regards,
simon