Well I’m trying to understand all the thing about blocks, threads, registers, occupancy, etc, etc, but first things first so here are my first questions…
Device 0: "GeForce 320M" CUDA Driver Version: 3.20 CUDA Runtime Version: 3.20 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 2 Total amount of global memory: 265027584 bytes Number of multiprocessors: 6 Number of cores: 48 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Clock rate: 0.95 GHz Concurrent copy and execution: Yes Run time limit on kernels: Yes Integrated: Yes Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously)
For this gfx card the number of THREADS PER BLOCK is 512 and the number of REGISTER per block is 16384, am I correct if I say that
[b]For each block on the grid I have 512 threads AND
For each thread ideally I have 32 registers[/b]
How can I count the number of registers used by a each thread in my CUDA programm?