OpenGL & CUDA interop with surfaces slow...

thuczek · July 5, 2018, 11:47am

Hey guys, I have a question regarding the performance of my code.
Initially I created something like this:

1. create GL texture

in a loop:
{
2. cudaMemCpy cached texture data from CPU to GPU
3. process the buffer in the kernel on the GPU
4. cudaMemCpy the pixel to the host
5. glTexSubImage the buffer the update the texture
}

And now I tried to use surfaces to avoid copying the buffers back and forth

1. create GL texture
2. use CUDA GL interop to map the texture data and create surfaces

in a loop
{
3. map the buffer to cuda
4. run the kernel
5. unmap
}

And I got surprised to see that the second solution is 7 times slower…
Can it be slower? Or I’m doing something wrong?
The kernel is the same, except writing to the buffers directly I use surf2Dread & surf2Dwrite.

cbuchner1 · July 5, 2018, 1:24pm

Is it really necessary to have the map/unmap steps within the loop? How many loop iterations are being run?

Have you timed the individual CUDA API calls with the visual profiler or similar tools?

thuczek · July 6, 2018, 6:39am

When I remove map/unmap, the textures are not updated.
Te kernel is called every frame for the infinite loop animation (until the app is closed).
I will use the profiler to try to understand more, thanks for the suggestion.

Topic		Replies	Views
OpenGL interop performance CUDA Programming and Performance	2	45	December 11, 2024
CUDA-OpenGL interop performance CUDA Programming and Performance	2	2443	May 30, 2014
Inefficient CUDA and OpenGL Interop CUDA Programming and Performance	4	2272	December 5, 2012
OpenGL interop very slow! CUDA Programming and Performance	6	6733	July 28, 2011
OpenGL interop performance issues again... (or rather, still...) CUDA Programming and Performance	7	2454	April 16, 2009
DX11 <> CUDA interop is slow compared to GL <> CUDA CUDA Programming and Performance	3	3028	January 5, 2020
device->host->device copy vs cudaGLMapBufferObject 6vs9ms, shouldn't mapping be way faster CUDA Programming and Performance	0	4813	July 12, 2007
OpenGL interop performance ... yes, STILL CUDA Programming and Performance	6	6477	March 29, 2010
cudaGraphicsMapResources each frame or just once when cuda-opengl interop ? which better? CUDA Programming and Performance	7	392	December 6, 2023
Concurrent read and write of a device array? CUDA Programming and Performance	3	486	January 1, 2021

OpenGL & CUDA interop with surfaces slow...

Related topics