Occupancy question

I’ve been fiddling around with the occupancy calculator and it suggest that I run the compiler nvcc with these args

--ptxas-options=-v

to get the number of registers and shared memory.

Ok, so I did so and I get something like this:

1>CUDA.cu

1>ptxas info	: Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_10'

1>ptxas info	: Used 17 registers, 468+16 bytes smem, 16 bytes cmem[1]

1>CUDA.cu

1>ptxas info	: Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_20'

1>ptxas info	: Used 22 registers, 448+0 bytes smem, 52 bytes cmem[0], 4 bytes cmem[16]

So… Which values should I use? 17 and 468+16 OR 22 and 448+0 ?

Thanks

I’ve been fiddling around with the occupancy calculator and it suggest that I run the compiler nvcc with these args

--ptxas-options=-v

to get the number of registers and shared memory.

Ok, so I did so and I get something like this:

1>CUDA.cu

1>ptxas info	: Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_10'

1>ptxas info	: Used 17 registers, 468+16 bytes smem, 16 bytes cmem[1]

1>CUDA.cu

1>ptxas info	: Compiling entry function '_Z9cu_CipherPhPiS0_S_S_' for 'sm_20'

1>ptxas info	: Used 22 registers, 448+0 bytes smem, 52 bytes cmem[0], 4 bytes cmem[16]

So… Which values should I use? 17 and 468+16 OR 22 and 448+0 ?

Thanks

First set is for achitecture sm_10 second set is for architecture sm_20, so it depends which acrhitecture your GPU is, but I expect it will be the sm_20

Cheers

First set is for achitecture sm_10 second set is for architecture sm_20, so it depends which acrhitecture your GPU is, but I expect it will be the sm_20

Cheers

First set is for achitecture sm_10 second set is for architecture sm_20, so it depends which acrhitecture your GPU is, but I expect it will be the sm_20

Cheers