The program is exactly the same for both. How is it possible?
On the other hand, I have used -maxregisters flag for this program and I have limited to ten registers. So, I have read that whenever we used that flag the registers are allocated to local memory, so this means costly solution. Is it allways like that? or we can assume that the compiler has used an optimzation in order to use less registers than previously.
Yes, whenever maxregcount flag is used, excessive registers are allocated in local memory - you can observe this by inspecting cubin file (.local directive or smth similar defines the amount of used local mem)
In general from my experience, register allocation algorithm in cuda is highly nondeterministic ;)
I had number of examples where changing one instruction, even condition statement ‘>=’ to ‘>’ increased/descreased the number of registers…
It would be really nice if NVIDIA can disclose it at least partially…
You’re right. For instance whenever i have used a parameter, my register count increase however if i use a macro decrease.
Things like this one are non deterministic and it has to have an answer for that.
Any guide for register usage?
Tip: I have checked muy cubin file once i have compiled with maxregisters flag set to ten. The amazing thing is that it is not using local memory (in the cubin file lmem=0) so, i guess i can suppose there was an register usage optimization, can’t i?
I am trying to optimize the number of registers to 10 in a matrix multiplication PTX file. Currently it uses 11 registers.
Using decuda I can see where the 11th register is being used, but looking at my PTX file I dont understand what
in the PTF file is causing the 11th register to be assigned.
Does anyone have any ideas how to reduce register usage by looking at the decuda ptx? Compilation of the original code gave me 16registers.
I made a lot of changes in ptx to bring it down to 11, but I need to bring it down to 10 registers for 100% occupancy.