How to calculate Global Memory used by kernel

I try to use CUDA_Occupancy_Calculator.
That Help tap is written below.

ptxas info : Compiling entry function ‘_Z8my_kernelPf’ for ‘sm_10’
ptxas info : Used 5 registers, 8+16 bytes smem
Let’s say “my_kernel” contains an external shared memory array which is allocated to be 2048 bytes at run time. Then our total shared memory usage per block is 2048+8+16 = 2072 bytes.

In my kernel case, the compile result shows below.

ptxas info : Compiling entry function ‘_Z14float_to_colorP6uchar4PKf’ for ‘sm_10’
ptxas info : Used 8 registers, 16+16 bytes smem, 44 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z13PRINT_POLYGONPhPiiiii’ for ‘sm_10’
ptxas info : Used 16 registers, 32+16 bytes smem, 20 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z14float_to_colorPhPKf’ for ‘sm_10’
ptxas info : Used 8 registers, 16+16 bytes smem, 44 bytes cmem[1]

PRINT_POLYGON is a kernel name.

the, How can I get the shared memory? The total amount of shared memory is 32+16 = 48 bytes?
Or, plus Global memory?
I allocated in VMEM following.

HANDLE_ERROR( cudaMalloc( (void **)&dev_IMAGE, sizeof(unsigned char)*512*512*3) );
PRINT_POLYGON<<<grid,block>>>( dev_IMAGE, dev_MEM, data->deviceID, 0, 1, 2);

You allocate shared memory for the “extern shared” array by giving it’s size in bytes as a third argument between <<< >>>. See appendix B.18 of the CUDA C Programming Guide.
You do not need to allocate memory for anything that is shown in the ptxas info. The compiler takes care of this already since the sizes of these objects are known at compile time.

You mean the total amount of shared memory in my kernel is 48bytes, right?
kernel<<<grid, block, Ns>>>(…);
Ns is just optional.

The total amount of shared memory used by your kernel is 48 bytes, plus whatever you give as third argument of the launch configuration (if any).

Thank you so much, tera. :)