The maximum number of registers per multiprocessor is 8192 I know that. but my question is if I have lets say 400 threads per multiproc. I can max have 20 registers per thread. Is there a way to tell the GPU that i only want to have 3 registers per thread or does the GPU fix this by himself
You can use -maxrregcount option to tell how many registers to use… but this usually means the data is stored in local memory instead, which makes it slower (even though you can get a higher occupancy)
Just compile using nvcc with “-po maxrregcount=3”
but if you are usually using 20 registers it probably won’t be able to get it down to 3, or it will take a very long time to compile. Maybe you could set it to 10 registers or so.