Register count? How to tell which part is using register in kernel function


My kernel function did not work correctly if I do not include -maxrregcount=32 when compiling as my blocksize is 512 (512*32=16K register). This indicates the number of registers that I used in my kernel function exceeds 32. As I only defined several integer variables in the kernel function, I would not expect the number of register is greater than 32. So, I am wondering if there is any way to tell which part is taking the register in kernel function.

I tried to use decuda, but somehow it does not like my .cubin file. Probably it is because I am using CUDA tookit 3.1.

Thanks for any suggestions!



have you tried the compiler option “–ptxas-options=”-v" ? But this will only tell you how many registers one of the threads uses. To see were all your registers go you will have to look into the assembly i think.

I think since cuda 3.0 .cubin files are purely binary so it’s not that easy to read kernel registers and shared mem out of these. Also tools like decuda ( never used it so dont know for sure ) wont support it right away.


Thanks, I tried to use “–ptxas-options=”-v" and that is how I knew that my register number exceeds 32

To see what registers are used for, you will have to disassemble the .cubin file using decuda. For CUDA 3.0 and later, you will also need a script to convert the new ELF based .cubin to the old text based .cubin format.

Thanks, tera!

The script works for my code!