cudaCreateTextureObject synchronizing

I am using cudaCreateTextureObject in one thread of a heavily-threaded (host threads) CUDA application. The application is using explicit CUDA streams for every operation. It has always behaved as expected, returning immediately. I am now debugging an issue where this call (as seen in NSight Systems) blocks its thread for many milliseconds. The completion of this call appears to coincide with the completion of other kernels/mempcy operations running in other host threads (often particular long-running kernel) , which feels like familiar behavior to device-synchronizing operations, like mempcy or kernel launches in the default stream.

What are the conditions under which cudaCreateTextureObject might block/synchronize? This behavior is very consistent: it does not intermittently block, it ALWAYS blocks when it occurs in connection with the same kernel.

I am currently still running 11.4 against driver 495.29.05 (cuda version 11.5) on an A100.


I don’t have an answer for you (and I don’t believe it is documented anywhere), but the general case is provided for in the documentation. The usage of streams does not necessarily prevent this. And you can find many questions like this here on these forums.