Migrating PyCuda code from Pascal to Turning resulting in insufficient resources errors

I have some rather large Pycuda Code, featuring multiple kernels, which is running fine on Pascal, but some (though not all) of the kernels are producing insufficient resource errors on Turing.

Specifically the error reads:
pycuda._driver.LaunchError: cuLaunchKernel failed: too many resources requested for launch

These errors (in my limited experience) usually point to insufficient registers (I think), or to too greater complexity in the kernel.
But Turing has the same if not greater amount of such resources as Pascal, no?

You can find out the register resource comparison by checking table 14 in the CUDA programming guide.

However, the compiler may use differing amounts of registers per thread when compiling code for two different architectures.