How to calculate number of registers per thread

Hello everyone,
May I know the way to calculate number of registers per thread

Following is piece of code which is running on each thread.

lTemp1 = (((16<< 14) * vHeight) / newHeight);
lTemp2 = (((16<< 14) * vWidth) / newWidth);

lTemp =((lTemp1 * yindex + 1) >> 14);

y = ((lTemp1 * yindex + 1) >> 18);

yPos = ((lTemp & 0X0F));

lTemp = ((lTemp2 * xindex + 1) >> 14);

x = ((lTemp2 * xindex + 1) >> 18);

int index_in = y * vopWidth + x;

Pos = (lTemp & 0X0F);

int WeightA= (16- yPos)* (16- Pos);
int WeightB= (16- yPos)* Pos;
int WeightC= (16- Pos)* yPos;
int WeightD= yPos* Pos;

Thanks in advance

Hello, Have you checked the cubin file? you should generate it and then you will see the register usage per thread.

flag --cubin

Also you can either use --ptxas-options=-v with nvcc

Does anyone know of a way to get the number of registers used by a particular kernel at runtime?

That’d be a nice thing to have, wouldn’t it? Unfortunately, I don’t think so, other than to keep your cubin files along with your exe and read through their text contents. But perhaps there’s a function in the Driver API? (Which will need to read the cubins anyway.)

I was infact considering the same exact thing for the library that we are developing… But then once your kernels are stable, you always know the register count. You can pass it on to the kernel at run time via an argument…

Still better,

You just need to first convert your CU to CUBIN, write a parser script (CMAKE cuda has one…) that will extract these values and create a custom #define for it and compile the sources again with a “-D” option that defines the register count…