I compiled the following code using CUDA 1.0.
I look like this code without using shared memory.
And, I look like this code with 20 byte of registers, because the code used variable x,y,tmp, row, col.
but, in the cubin file, smem was 28 and reg was 8.
I couldn’t understand this reason.
Please teach me this reason.
global void mul(float *a, float b, float c)
int x=threadIdx.x, y=threadIdx.y;
for(int i=0; i<N; i++)
name = mul;
lmem = 0
smem = 28
reg = 8
bar = 0
segname = const
segnum = 1
offset = 0
bytes = 4