My kernel function did not work correctly if I do not include -maxrregcount=32 when compiling as my blocksize is 512 (512*32=16K register). This indicates the number of registers that I used in my kernel function exceeds 32. As I only defined several integer variables in the kernel function, I would not expect the number of register is greater than 32. So, I am wondering if there is any way to tell which part is taking the register in kernel function.
I tried to use decuda, but somehow it does not like my .cubin file. Probably it is because I am using CUDA tookit 3.1.
have you tried the compiler option “–ptxas-options=”-v" ? But this will only tell you how many registers one of the threads uses. To see were all your registers go you will have to look into the assembly i think.
I think since cuda 3.0 .cubin files are purely binary so it’s not that easy to read kernel registers and shared mem out of these. Also tools like decuda ( never used it so dont know for sure ) wont support it right away.
To see what registers are used for, you will have to disassemble the .cubin file using decuda. For CUDA 3.0 and later, you will also need a script to convert the new ELF based .cubin to the old text based .cubin format.