I encountered an odd problem about the use of register in my kernel function.
My kernel function currently has 46 registers, which seems large as the occupancy will be low. So, I wanted to reduce the number of registers.
I assume that the number of registers comes from the local variables that I defined in the kernel function. But whatever I decreased or increased the number of the locally defined variables, the total register number kept the same. That is odd ~~~. Did I miss something here?
Then, I tried a silly method to identify which part of code used the register and surprisingly found that once I disabled one line, which assigns the value of a local variable to a global array element (the array reside in page-locked memory), the number of used register became 0. This is even more odd, isn’t this? ~~
I was totally confused about what I saw here. Is there clue ? Thanks!
P.S. my compiling command to detect the register number is “nvcc -c -O3 -arch sm_13 --ptxas-options=”-v" "
you removed the code that “…assigns the value of a local variable to a global array element…”
Let me guess: that was the result of the computation.
There’s a feature called dead code optimization that concluded that your entire kernel was now
performing work that was unnecessary (as the result was never stored) - so the entire kernel
was optimized away. → 0 registers.
you removed the code that “…assigns the value of a local variable to a global array element…”
Let me guess: that was the result of the computation.
There’s a feature called dead code optimization that concluded that your entire kernel was now
performing work that was unnecessary (as the result was never stored) - so the entire kernel
was optimized away. → 0 registers.
I have not found anyone able to explain how the compiler decides to create registers. The majority of your registers are likely created by the compiler and not variables which you explicitly created in the kernel. Even worse, it seems like, similar to what you’ve experienced, if I remove a variable I explicitly created, the compiler often decides it can create another just to drive me nuts. I would love to be able to know exactly what is going into the register, when and where. You can limit registers using a command at compile but it seems to just dump them into local memory, which is horrible.
Bottom line, there must be a way to write code in a way which the compiler does not feel the need to use so many registers. But I haven’t found anybody who knows how.
I have not found anyone able to explain how the compiler decides to create registers. The majority of your registers are likely created by the compiler and not variables which you explicitly created in the kernel. Even worse, it seems like, similar to what you’ve experienced, if I remove a variable I explicitly created, the compiler often decides it can create another just to drive me nuts. I would love to be able to know exactly what is going into the register, when and where. You can limit registers using a command at compile but it seems to just dump them into local memory, which is horrible.
Bottom line, there must be a way to write code in a way which the compiler does not feel the need to use so many registers. But I haven’t found anybody who knows how.
Thanks all!
Yes, it is driving me crazy as I don’t have any way to control the register that I use.
I want to have the locally defined variables resided in registers as they are frequently accessed.
The disabled line which drastically reduces the total register from 46 to 0 is the last line of my code, which returns the local variable value. It looks like the compiler is smart enough to detect that my kernel function is just a “dead code” as there is no returned value, so all local variables do not need to reside in register.
I tried to use decuda, but feel hard to interpret the output. Too many tricks to figure out ~~~ :wacko:
Thanks all!
Yes, it is driving me crazy as I don’t have any way to control the register that I use.
I want to have the locally defined variables resided in registers as they are frequently accessed.
The disabled line which drastically reduces the total register from 46 to 0 is the last line of my code, which returns the local variable value. It looks like the compiler is smart enough to detect that my kernel function is just a “dead code” as there is no returned value, so all local variables do not need to reside in register.
I tried to use decuda, but feel hard to interpret the output. Too many tricks to figure out ~~~ :wacko: