I am running same kernels using sm_1.x and sm_2.x, and profiling them using the command line profiler. I have set a configuration file to record the following attributes:
gridsize3d threadblocksize dynsmemperblock stasmemperblock regperthread
While compiling, the sm_1.x version shows:
ptxas info : Used 52 registers, 144+16 bytes smem, 48 bytes cmem
But upon executing the code and inspecting the cuda_profile log-file, I see:
dynsmemperblock stasmemperblock regperthread occupancy 45552 0 63 0.17
I expect the regperthread to be 52 and the static shared memory should be 160 bytes, why is it not showing?