Concurrent read and write of a device array?

Hello, I have a kernel that’ll run for about a minute, which will continuously write updates to an array of integers on device. I’d like to visualize the progress of this kernel, that is, to copy this array (actually a texture) to an opengl texture, say every 2 seconds. I use cudaGraphicsMapResources and cudaMemcpyToArray to do the data transfer. Is there a way to achieve this? I’m new to CUDA and can’t find any plausible solution… I’ve tried to use a separarte stream for the kernel, but the copy always happens after the kernel launch finishes. Thanks!

Are the kernel and the copy both on manually-created streams? Because if one is on the default stream, that stream synchronizes with all other streams.

Otherwise, maybe you could break up your update into chunks. Instead of UpdateEverything, CopyEverything your could loop over UpdateTenPercent, CopyTenPercent to get ten preview steps per update.

I agree that buffer/image map/unmap does not seem to proceed when a kernel is running, regardless of stream usage. I could speculate as to why this might be - basically that the map/unmap process affects the GPU/CUDA context memory map, and I think modifications to the memory map are typically not done until there are no kernels executing.

Since your update rate desire is fairly low - 1 frame every 2 seconds, I believe it should be possible to let your kernel run, asynchronously copy data to a host pinned buffer, and then use glDrawPixels (or your favorite texture method) from that host memory to update the display.

Thanks for your replies! I’ve found a solution using two host threads, one launches the kernel (ten percent) in a loop, and the other run at 60 fps, checking if the kernel has just finished during each frame (using a mutex and conditional variable to synchronize). In this way I can maintain a responsive ui and update the calculated texture whenever it is available.