Hi all,
Can someone tell me what is the meaning of sm_10 and sm_20 in ptxas info?
When i compile my kernel it gives two different values for registers as follows:
ptxas info : Compiling entry function ‘Z13testKernelNewPiS_iiiiP12GPUstageInfoP14testClassifieriiPbS_S3’ for ‘sm_10’
1>ptxas info : Used 14 registers, 13248+16 bytes smem, 24 bytes cmem[1]
1>ptxas info : Compiling entry function ‘Z13testKernelNewPiS_iiiiP12GPUstageInfoP14testClassifieriiPbS_S3’ for ‘sm_20’
1>ptxas info : Used 20 registers, 13184+0 bytes smem, 84 bytes cmem[0], 8 bytes cmem[16]
Can someone tell me what will be the total number of register per thread will be used in this case?
I am facing a weird problem, if the number of registers in sm_20 are less than 20 then all the data in my shared memory is correct but if number of registers in sm_20 becomes more than or equal to 20 than my shared memory data becomes zero. MY GPU has 16kB of registers and my each block is using 256 threads. So according to me number of registers are enough so what could be the reason for such a behaviour?
Hi all,
Can someone tell me what is the meaning of sm_10 and sm_20 in ptxas info?
When i compile my kernel it gives two different values for registers as follows:
ptxas info : Compiling entry function ‘Z13testKernelNewPiS_iiiiP12GPUstageInfoP14testClassifieriiPbS_S3’ for ‘sm_10’
1>ptxas info : Used 14 registers, 13248+16 bytes smem, 24 bytes cmem[1]
1>ptxas info : Compiling entry function ‘Z13testKernelNewPiS_iiiiP12GPUstageInfoP14testClassifieriiPbS_S3’ for ‘sm_20’
1>ptxas info : Used 20 registers, 13184+0 bytes smem, 84 bytes cmem[0], 8 bytes cmem[16]
Can someone tell me what will be the total number of register per thread will be used in this case?
I am facing a weird problem, if the number of registers in sm_20 are less than 20 then all the data in my shared memory is correct but if number of registers in sm_20 becomes more than or equal to 20 than my shared memory data becomes zero. MY GPU has 16kB of registers and my each block is using 256 threads. So according to me number of registers are enough so what could be the reason for such a behaviour?
It probably is just a coincidence. In a different topic you wrote that your GPU is a Geforce 335M, which is compute capability 1.2. So the sm_20 compiled code will never be executed, and should not influence your results at all.
To make optimal use of your GPU, you could set the architecture to sm_12 instead of sm_10.
It probably is just a coincidence. In a different topic you wrote that your GPU is a Geforce 335M, which is compute capability 1.2. So the sm_20 compiled code will never be executed, and should not influence your results at all.
To make optimal use of your GPU, you could set the architecture to sm_12 instead of sm_10.
ohhhh. So it means the change in number of register in sm_20 will not affect results. Thanks for your reply. I will try to debug.
Is there any debugger which can run only with single GPU?
Regards
Mohit
[quotee=‘tera’ date=‘23 November 2010 - 08:51 AM’ timestamp=‘1290473462’ post=‘1150231’]
It probably is just a coincidence. In a different topic you wrote that your GPU is a Geforce 335M, which is compute capability 1.2. So the sm_20 compiled code will never be executed, and should not influence your results at all.
To make optimal use of your GPU, you could set the architecture to sm_12 instead of sm_10.
ohhhh. So it means the change in number of register in sm_20 will not affect results. Thanks for your reply. I will try to debug.
Is there any debugger which can run only with single GPU?
Regards
Mohit
[quotee=‘tera’ date=‘23 November 2010 - 08:51 AM’ timestamp=‘1290473462’ post=‘1150231’]
It probably is just a coincidence. In a different topic you wrote that your GPU is a Geforce 335M, which is compute capability 1.2. So the sm_20 compiled code will never be executed, and should not influence your results at all.
To make optimal use of your GPU, you could set the architecture to sm_12 instead of sm_10.