Kernel executed in non-default CUDA stream waits for other streams to complete cudaMemcpyAsync

you may be hitting a lazy loading situation

3 Likes