I am using my Tesla P40 with driver 410.72 for a real-time matrix multiplication and I am getting a strange issue. When I launch a Kernel everything closes properly but I get the error “too many resources requested”. The strange part is that the number of threads (1024 threads in 4 blocks) and memory used is identical to another Kernel that runs successfully every time without issue. At first the only difference was the number of input arguments which I see from this forum has the potential to cause this issue, but upon changing these to be identical, as well, I get the same issue. So now I have two kernels with identical thread/block usage, input arguments, and memory usage but one kernel throws this error and the other works every time. The only real difference at this point is the names of the two kernels and the amount of shared memory used, but the kernel that uses more shared memory is the one that works.
Please let me know if you can help. Thank you,