cudaGraphicsUnmapResources and concurrent copy and execute

I’m trying to build a program that combines CUDA and OpenGL and uses concurrent copy and execute on kernels.

It seems to mostly work fine except for the call to cudaGraphicsUnmapResources on a map texture that despite accepting a stream as input will not allow a copy to device to happen in parallel.

Anyone know if there is a way around this?

I’m using CUDA 4 with 280 drivers at the moment.


You’re not using the default stream anywhere are you?

Before the pipeline starts, nowhere in the pipeline, and the profiler verifies this. The only thing that doesn’t explicitly use a stream and I didn’t see that it is possible is cudaGraphicsSubResourceGetMappedArray.