OpenGL interop very slow!

Hi,

I’m using CUDA v3.0, because of better OpenGL interop. support, but it is very slow. I’m writing simple voxelizer using OpenGL. The result (3d linear memory) could be then used by some CUDA kernels. The problem is the transfer between OpenGL and CUDA.

The main algorithm:

For every slice (z):

  • rasterize slice into 2d texture

  • copy 2d texture to CUDA 3d linear memory

The cudaGraphicsMapResources function seems to be very slow (for texture, renderbuffer and pbo too), the speed is almost equal to copying the texture to CPU pinned memory!

BTW I tried cudaGraphicsMapResources without any CUDA and OpenGL calls:

[codebox]

while(1)

{

cudaGraphicsMapResources

cudaGraphicsUnMapResources

}

[/codebox]

No kernel, no OpenGL commands. And it is still very slow.

Any idea?

Is your compute card also the display card?

Did you, by chance, enable a second monitor output on the display card? In many cases it makes interop slower.

Is your compute card also the display card?

Did you, by chance, enable a second monitor output on the display card? In many cases it makes interop slower.

Yes, there is only one GPU.

I think I found walkaround:

  • create 3d texture (opengl)

  • attach to fbo

  • render to 3d texture (using z-slice)

  • map 3d texture to CUDA (cudaArray)

  • use tex3D for lookups

Seems to be pretty fast External Media

Yes, there is only one GPU.

I think I found walkaround:

  • create 3d texture (opengl)

  • attach to fbo

  • render to 3d texture (using z-slice)

  • map 3d texture to CUDA (cudaArray)

  • use tex3D for lookups

Seems to be pretty fast External Media

How fast is “pretty”?

I’m doing just like you, but I’m rendering to a 2D texture.

I noticed that my FPS drops significantly when I attempt to map the texture for cuda reading.

This isn’t really addressing your problem, but another possible solution would be to just do the voxelization in CUDA too :)