Can someone tell me what is the meaning of sm_10 and sm_20 in ptxas info?
When i compile my kernel it gives two different values for registers as follows:
ptxas info : Compiling entry function ‘Z13testKernelNewPiS_iiiiP12GPUstageInfoP14testClassifieriiPbS_S3’ for ‘sm_10’
1>ptxas info : Used 14 registers, 13248+16 bytes smem, 24 bytes cmem
1>ptxas info : Compiling entry function ‘Z13testKernelNewPiS_iiiiP12GPUstageInfoP14testClassifieriiPbS_S3’ for ‘sm_20’
1>ptxas info : Used 20 registers, 13184+0 bytes smem, 84 bytes cmem, 8 bytes cmem
Can someone tell me what will be the total number of register per thread will be used in this case?
I am facing a weird problem, if the number of registers in sm_20 are less than 20 then all the data in my shared memory is correct but if number of registers in sm_20 becomes more than or equal to 20 than my shared memory data becomes zero. MY GPU has 16kB of registers and my each block is using 256 threads. So according to me number of registers are enough so what could be the reason for such a behaviour?