my ptx info is shown in follow:
1>ptxas info : Used 18 registers, 9924+8516 bytes smem, 28 bytes cmem[1], 4 bytes cmem[14]
i used 9924bytes smem, but complier used 8516bytes smem. and i find when transfer data between smem and gmem, complier take alot of smem to finish, is there any way to reduce?
The +8516 is an internal value that nVidia uses. Your kernel actually uses 9924 bytes of shared memory and NOT 9924+8516…
eyal
9924+8516 is 18440 so it wouldn’t even fit.
thank you for your replies~
is that means i don’t need to care about the smem used by complier?
Its not used by the compiler. Your total kernel usage of smem is 9924. (the 8516 is just an internal figure which is part of the 9924)
eyal
Oh…i see, thank you very much~