Resource usage & optimization read a cubin file...

Haarsh · August 4, 2008, 1:44pm

Hi,

I need some precisions about the values supplied by my cubin file:

lmem = 32

	smem = 36

	reg = 28

	bar = 0

As there is no declaration of shared variables in my kernel, the SMEM value should only correspond to my parameters:

__global__ void

FooKernel ( unsigned w,

   unsigned h,

   float s,

                  float sq, 

                  float* o );

So, 4 bytes per float and unsigned, and 1 byte for a float* → 17B , there is a big difference :glare:

MisterAnderson42 · August 4, 2008, 4:12pm

blockDim and gridDim are also stored in shared memory. So, add an additional 20 bytes (x,y,z for blockDim and x,y for gridDim). Then, if you allow for 4 bytes for your unsigned (perhaps this is needed for packing reasons…), you get 36 bytes total.

Haarsh · August 4, 2008, 4:39pm

Thx a lot, I forgort blockDim and gridDim.

But I have 3 more questions:

(1)

2 unsigneds ( 2x4bytes) + 2 floats ( 2x4 bytes) + gridDim & blockDim ( 20bytes ) + float* ( I suppose that is coded on 1 byte) = 37 bytes, not 36. So do you have an idea about this small difference?

(2)

Do you suggest in your post that an unsigned can be coded on less than 4 bytes? :dry:

(3)

The LMEM value suggests that my 2 floats arrays

float e[4], d[4];

(declared in my kernel) are stored into the local memory. As I need to keep the array format, is there a solution to be sure they will be store into registers (shared memory would induce a lot of bank conflicts)

Regards

H.

MisterAnderson42 · August 4, 2008, 5:17pm

Oh, I missed the first unsigned.
2 unsigneds - 8 bytes
2 floats - 8 bytes
blockDim/GridDim - 20 bytes
float* - 4 bytes / 8 bytes on 64-bit platform
= 40 bytes / 44 bytes on 64-bit platform.

I don’t know why that doesn’t add up exactly to 36. As I said before, there may be some packing going on. I.e. blockDim doesn’t need 4 bytes for each value: they could be stored in a 16-bit region of memory. I don’t know the full details of how these are addressed. Either see the PTX ISA manual for the details, and read the ptx or use wumpus’s decuda tool to find out.

Local memory has been discussed many times on the forums. The summary is that as long as you index into the arrays with compile-time evaluated constants then the arrays will be stored in registers.

Haarsh · August 4, 2008, 9:26pm

Thx, I’m going to read the PTX manual.

Topic		Replies	Views
where is the another 32 byte shared memory CUDA Programming and Performance	2	6050	July 21, 2009
shared memory & register usage CUDA Programming and Performance	2	3394	August 17, 2007
--ptxas-options=-v question CUDA Programming and Performance	3	9191	May 15, 2008
Weird lmem issue CUDA Programming and Performance	3	2375	July 27, 2010
mysterious local memory usage in my kernel CUDA Programming and Performance	2	3351	April 5, 2010
Strange Compiler Shared Memory Usage CUDA Programming and Performance	5	6597	November 19, 2009
How to understand lmem, smem, reg? CUDA Programming and Performance	5	4516	March 23, 2011
Kernel fails to run due to too much lmem, but why? CUDA Programming and Performance	0	2042	June 18, 2009
Newbie question: Shared memory CUDA Programming and Performance	7	2740	July 12, 2008
blockDim question Where is it stored? CUDA Programming and Performance	9	4598	October 7, 2008

Resource usage & optimization read a cubin file...

Related topics