I have some rather large Pycuda Code, featuring multiple kernels, which is running fine on Pascal, but some (though not all) of the kernels are producing insufficient resource errors on Turing.
Specifically the error reads:
pycuda._driver.LaunchError: cuLaunchKernel failed: too many resources requested for launch
These errors (in my limited experience) usually point to insufficient registers (I think), or to too greater complexity in the kernel.
But Turing has the same if not greater amount of such resources as Pascal, no?