This is a strange issue. I have some simple cublasSgemv calls which run fine over hundreds of thousands of iterations on a 1080ti, 2080ti, etc. However, they crash after only a few calls on an RTX 3090 (compiled with CUDA 11.2). Same data, but Ampere crashes.
I have double, triple and quadruple checked the size of my input arrays, and values of all other arguments…and the fact that the code runs fine over huge datasets for so many iterations on non-Ampere architecture leads me to believe this is a cublas Ampere-related bug.
NSight Compute returns the following API Stream results during the crash (with “cudaErrorInvalidResourceHandle” reported as the error generated during the call to cublasSgemv):
Any ideas what’s going on? What’s even stranger is that if I isolate the exact data sent to cublasSgemv (the matrix/vector/result arrays, filled with the same data that causes the crash in the larger application) and compile it into its own application, no crash occurs. So perhaps the cublas library is doing some internal, opaque allocations that cause an issue after a while on Ampere? From the API stack in the screenshot I can see Ampere-specific functions are being called, so it’s not a stretch to believe that an Ampere-related bug is within the realm of possibilities…