Passing variables as parameter In which memory are they stored?

Beteigeuze · August 17, 2010, 2:37pm

Hi guys,

I am wondering about, where the variables are stored if I pass them as a parameter from host to device.

__global__ void g_CUDARTKernel(uchar4 *pixelBuffer, Core* core, int width, int height, World* world)

	{

		// access thread id

		unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

		unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

		const unsigned int tid = y*width + x;

		// calculate raytracing pixels with CUDA

		if ((x < width) && (y < height))

			core->render_scene(x, y, tid, pixelBuffer, 0, world);

	}

The pointers pixelBuffer, core and world are stored in the global memory. But how will every thread access the width and height?

Is width and height stored in each local memory so in the global memory (slow) or is there a clever implicit shared-memory storage (fast on-chip memory)?

Thanks a lot!

MisterAnderson42 · August 17, 2010, 3:27pm

kernel parameters are stored in shared memory for compute 1.x and constant memory for compute 2.x. It is all fast and automatic, so there is nothing to worry about.

MisterAnderson42 · August 17, 2010, 3:27pm

kernel parameters are stored in shared memory for compute 1.x and constant memory for compute 2.x. It is all fast and automatic, so there is nothing to worry about.

Beteigeuze · August 17, 2010, 3:55pm

ah perfect! that’s great - thanks a lot!

Beteigeuze · August 17, 2010, 3:55pm

ah perfect! that’s great - thanks a lot!

jan.heckman · August 19, 2010, 6:38pm

A thought nags me a bit:

Since afaik constant memory cannot be properly allocated, using it explicitly in an app (for, say matrices which get used again and again by a kernel) might conflict with this use of constant memory for passing variables to kernels, even if the constant memory for the variables is swapped out and back (the kernel might rely on the original values).

So this use of constant memory seems wrong to me.

Am I missing something?

jan.heckman · August 19, 2010, 6:38pm

A thought nags me a bit:

Since afaik constant memory cannot be properly allocated, using it explicitly in an app (for, say matrices which get used again and again by a kernel) might conflict with this use of constant memory for passing variables to kernels, even if the constant memory for the variables is swapped out and back (the kernel might rely on the original values).

So this use of constant memory seems wrong to me.

Am I missing something?

tera · August 19, 2010, 6:45pm

What do you mean by “cannot be properly allocated”? Are you referring to the fact that all allocations are static and no dynamical allocations possible?

tera · August 19, 2010, 6:45pm

What do you mean by “cannot be properly allocated”? Are you referring to the fact that all allocations are static and no dynamical allocations possible?

jan.heckman · August 21, 2010, 8:41pm

That’s it. But the compiler does appear to keep track of the memory used for declared variables, so it ought to know whether constant memory can be used for parameter passing.

So probably there is no fundamental problem, as long as the compiler can fall back on other methods of parameter passing. Making compilers is just a very complicated job, I guess.

jan.heckman · August 21, 2010, 8:41pm

That’s it. But the compiler does appear to keep track of the memory used for declared variables, so it ought to know whether constant memory can be used for parameter passing.

So probably there is no fundamental problem, as long as the compiler can fall back on other methods of parameter passing. Making compilers is just a very complicated job, I guess.

SPWorley · August 21, 2010, 9:45pm

An interesting question, though…

If shared memory is used for Compute 1.x and constant memory for Compute 2.0. So what does cudaFuncGetAttributes() supposed to report for shared memory use? How can it compute it if it doesn’t know which device it’s for yet, since cudaFuncGetAttributes is context independent?

I suspect I’ve been hit by this myself in systems with both a GTX295 and GTX480 in them… I try to give the blocks as much shared dynamic memory as possible and use cudaFuncAttributes() to determine how much is available. I suspect that cudaFuncGetAttributes may underreport shared memory use for the GTX295 in these cases, causing the kernel launch to fail due to too few resources (I have to ask for less shared mem). I should really make a test case and report it.

SPWorley · August 21, 2010, 9:45pm

An interesting question, though…

If shared memory is used for Compute 1.x and constant memory for Compute 2.0. So what does cudaFuncGetAttributes() supposed to report for shared memory use? How can it compute it if it doesn’t know which device it’s for yet, since cudaFuncGetAttributes is context independent?

I suspect I’ve been hit by this myself in systems with both a GTX295 and GTX480 in them… I try to give the blocks as much shared dynamic memory as possible and use cudaFuncAttributes() to determine how much is available. I suspect that cudaFuncGetAttributes may underreport shared memory use for the GTX295 in these cases, causing the kernel launch to fail due to too few resources (I have to ask for less shared mem). I should really make a test case and report it.

tera · August 22, 2010, 4:42am

Why should the compiler ever fall back to a different method for parameter passing? Constant memory consists of multiple banks of 64 KB each, and constant variables and parameters use different banks, so they cannot interfere or cause the constant memory to overflow.

tera · August 22, 2010, 4:42am

Why should the compiler ever fall back to a different method for parameter passing? Constant memory consists of multiple banks of 64 KB each, and constant variables and parameters use different banks, so they cannot interfere or cause the constant memory to overflow.