CUDA code working on <=Volta, not working on RTX Titan

This is an issue I don’t think I’ve ever had in all my years of CUDA computing.

I have CUDA code that I call through custom Python framework which runs fine on all GPUs except a RTX Titan. The code runs perfectly fine on Titan Volta, and previous GPUs like the GTX 1080Ti, Quadro M1200, etc. But when I try running it on the RTX Titan, I see the GPU memory loaded (so it at least transfers data to the GPU), and then it crashes within seconds afterward. Here’s a summarized traceback of the error output:

From Python’s terminal:

An error ocurred while starting the kernel
GPUassert: 4 unspecified launch failure cu_sync.cu 11

When I run CUDA-MEMCHECK on the app, I get the following output…

GPUassert: 13 CUBLAS_STATUS_EXECUTION_FAILED cublas_gemm.cu 233
========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaFuncSetAttribute.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuD3D9UnmapVertexBuffer + 0x2e2c85) [0x2f105b]
=========     Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x20b41) [0x460651]
=========     Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x56d5) [0x4451e5]
=========     Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasCreate_v2 + 0x40e) [0x10253e]
=========     Host Frame:C:\Users\User\Desktop\..\pycu_interface\cublas_helpers\lib\cublas.dll (cublas_init + 0x27) [0x161f7]

And so on… it prints about 30 more of these. The last one looks like:

========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaGetLastError.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuD3D9UnmapVertexBuffer + 0x2e2c85) [0x2f105b]

While I don’t have any other RTX GPUs to test if this is an isolated issue with that generation of cards, Volta, Pascal, and Maxwell generations of cards work without error.

I’ve only been able to test the RTX Titan hardware itself in some gaming benchmarks, where it worked fine, and performed as expected (slightly better than the Titan V). I will test an OpenCL app this weekend.

I’ve tried reinstalling the latest drivers (417.35), the latest CUDA Toolkit (10.0). I’ve tried recompiling my libs from sm_30 to sm_75 as well.

I am running Win 10 Pro right now, and will get around to testing Linux sometime early next year to see if this is another random CUDA+Windows issue as well, but this is unlike others I’ve encountered before. Usually such an error/issue was consistent across different GPUs, unlike the one I am facing right now.

Any help much appreciated, thanks!

often the best thing to do in these cases is to develop a small self-contained reproducer code, and then file a bug at developer.nvidia.com

Hi there, is there any follow-up regarding what’s causing this issue? I am also having trouble running CUDA (Toolkit v10) code on Titan RTX (albeit they work without any hiccups on RTX 2080).