Hi all,
I have some doubt regarding the output of --ptxas-options=“v” flag.
When i use the flag in my program i get this output for the ‘equation’ kernel.
ptxas info : Compiling entry function ‘__globfunc__Z8equationPfS_S_ii’
ptxas info : Used 3 registers, 2084+1060 bytes smem, 40 bytes cmem[0], 12 bytes cmem[1]
It uses 3 registers but how much is the shared memory used. Is it 2084+1060 or 2084 or 1060 bytes? I have declared 2 arrays of 256 floats i.e. 2048 bytes in my kernel.
2084+1060 means you’ve explicitly allocated 2048 bytes smem and compiler allocated another 1060 bytes smem for some purpose (parameter passing, for example). cmem stands for constant memory.
I’m new at this and I actually had this same question and this is what I found.
nvcc_2.0.pdf page 28:
[indent]"A summary on the amount of used registers and the amount of memory needed per
compiled device function can be printed by passing option –v to ptxas:
nvcc -Xptxas –v acos.cu
ptxas info : Compiling entry function ‘acos_main’
ptxas info : Used 4 registers, 60+56 bytes lmem, 44+40 bytes smem, 20 bytes cmem[1], 12 bytes cmem[14]
As shown in the above example, the amounts of local and shared memory are listed by two numbers each. The first number represents the total size of all variables declared in local or shared memory, respectively. The second number represents the amount of system- allocated data in these memory segments: device function parameter block (in shared memory) and thread/grid index information (in local memory).
Used constant memory is partitioned in constant program ‘variables’ (bank 1), plus compiler generated constants (bank 14)."[/indent]
So “ptxas info : Used 3 registers, 2084+1060 bytes smem, 40 bytes cmem[0], 12 bytes cmem[1]” means you have 2084 bytes of shared memory allocated and of that 2084 bytes 1060 bytes is allocated by the system. So 1024 is what you allocated and 1060 is what the system allocated. Hmm this is weird, are you actually using both arrays because the compiler might of optimized by removing one of the arrays or it’s reporting bad numbers and it should say 2048+36.
Yeah, i think that is correct. The total amount of shared memory used per block would be 2K, since i declare a total of 2K size arrays.
Actually, this is a just a output of a dummy kernel i used for understanding what exactly the output represent.
Thanks a lot