I’m using CUDA v3.0, because of better OpenGL interop. support, but it is very slow. I’m writing simple voxelizer using OpenGL. The result (3d linear memory) could be then used by some CUDA kernels. The problem is the transfer between OpenGL and CUDA.
The main algorithm:
For every slice (z):
rasterize slice into 2d texture
copy 2d texture to CUDA 3d linear memory
The cudaGraphicsMapResources function seems to be very slow (for texture, renderbuffer and pbo too), the speed is almost equal to copying the texture to CPU pinned memory!
BTW I tried cudaGraphicsMapResources without any CUDA and OpenGL calls:
No kernel, no OpenGL commands. And it is still very slow.