I’ve been fiddling around with the occupancy calculator and it suggest that I run the compiler nvcc with these args
--ptxas-options=-v
to get the number of registers and shared memory.
Ok, so I did so and I get something like this:
1>CUDA.cu
1>ptxas info : Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_10'
1>ptxas info : Used 17 registers, 468+16 bytes smem, 16 bytes cmem[1]
1>CUDA.cu
1>ptxas info : Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_20'
1>ptxas info : Used 22 registers, 448+0 bytes smem, 52 bytes cmem[0], 4 bytes cmem[16]
So… Which values should I use? 17 and 468+16 OR 22 and 448+0 ?
I’ve been fiddling around with the occupancy calculator and it suggest that I run the compiler nvcc with these args
--ptxas-options=-v
to get the number of registers and shared memory.
Ok, so I did so and I get something like this:
1>CUDA.cu
1>ptxas info : Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_10'
1>ptxas info : Used 17 registers, 468+16 bytes smem, 16 bytes cmem[1]
1>CUDA.cu
1>ptxas info : Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_20'
1>ptxas info : Used 22 registers, 448+0 bytes smem, 52 bytes cmem[0], 4 bytes cmem[16]
So… Which values should I use? 17 and 468+16 OR 22 and 448+0 ?
First set is for achitecture sm_10 second set is for architecture sm_20, so it depends which acrhitecture your GPU is, but I expect it will be the sm_20
First set is for achitecture sm_10 second set is for architecture sm_20, so it depends which acrhitecture your GPU is, but I expect it will be the sm_20
First set is for achitecture sm_10 second set is for architecture sm_20, so it depends which acrhitecture your GPU is, but I expect it will be the sm_20