Some experience using decuda and cuobjdump

penguinl · November 9, 2010, 9:51am

According to decuda and cuobjdump, I tried to figure out how the parameters are stored (for sm_1x).
It’s clearer now to read the disassembler’s output. Hope it helps.

Shared memory structure. (Suppose 32bit params)

0x00:
0x02: blockDim.x    (g[01].U16)
0x04: blockDim.y    (g[02].U16)
0x06: blockDim.z    (g[03].U16)
0x08: gridDim.x     (g[04].U16)
0x0a: gridDim.y     (g[05].U16)
0x0c: blockIdx.x    (g[06].U16)
0x0e: blockIdx.y    (g[07].U16)
0x10: the first param   (g[04])
0x12:
0x14: the second param  (g[05])
0x16:
0x18: the third param   (g[06])
0x1a:
0x1c: the fourth param  (g[07])
......

R0L -> threadIdx.x
R0H && 0x03ff -> threadIdx.y
R0H>>0x0a -> threadIdx.z

penguinl · November 9, 2010, 9:51am

According to decuda and cuobjdump, I tried to figure out how the parameters are stored (for sm_1x).
It’s clearer now to read the disassembler’s output. Hope it helps.

Shared memory structure. (Suppose 32bit params)

0x00:
0x02: blockDim.x    (g[01].U16)
0x04: blockDim.y    (g[02].U16)
0x06: blockDim.z    (g[03].U16)
0x08: gridDim.x     (g[04].U16)
0x0a: gridDim.y     (g[05].U16)
0x0c: blockIdx.x    (g[06].U16)
0x0e: blockIdx.y    (g[07].U16)
0x10: the first param   (g[04])
0x12:
0x14: the second param  (g[05])
0x16:
0x18: the third param   (g[06])
0x1a:
0x1c: the fourth param  (g[07])
......

R0L -> threadIdx.x
R0H && 0x03ff -> threadIdx.y
R0H>>0x0a -> threadIdx.z

murtazam · May 23, 2012, 9:07pm

According to decuda and cuobjdump, I tried to figure out how the parameters are stored (for sm_1x).

It’s clearer now to read the disassembler’s output. Hope it helps.

Shared memory structure. (Suppose 32bit params)

0x00:
0x02: blockDim.x    (g[01].U16)

0x04: blockDim.y    (g[02].U16)

0x06: blockDim.z    (g[03].U16)

0x08: gridDim.x     (g[04].U16)

0x0a: gridDim.y     (g[05].U16)

0x0c: blockIdx.x    (g[06].U16)

0x0e: blockIdx.y    (g[07].U16)

0x10: the first param   (g[04])

0x12:

0x14: the second param  (g[05])

0x16:

0x18: the third param   (g[06])

0x1a:

0x1c: the fourth param  (g[07])

......
R0L → threadIdx.x
R0H && 0x03ff -> threadIdx.y

R0H>>0x0a -> threadIdx.z

Hi,

Can anyone please tell me what the global memory in the parenthesis against each shared memory location represents? Is the parameter stored in both shared and global memories as shown above?

Thanks.

njuffa · May 23, 2012, 11:28pm

Global memory is not involved. For sm_1x launch configuration and kernel arguments are passed in shared memory. The layout shown above looks about right, but I have nit verified it. For newer architectures launch configuration is passed in special registers and kernel arguments are passed via constant memory, giving full use of the shared memory to the kernel. If you dump machine code with cuobjdump you can see exactly what is going on.