Passing variables as parameter In which memory are they stored?

Hi guys,

I am wondering about, where the variables are stored if I pass them as a parameter from host to device.

__global__ void g_CUDARTKernel(uchar4 *pixelBuffer, Core* core, int width, int height, World* world)

	{

		// access thread id

		unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

		unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

		const unsigned int tid = y*width + x;

		// calculate raytracing pixels with CUDA

		if ((x < width) && (y < height))

			core->render_scene(x, y, tid, pixelBuffer, 0, world);

	}

The pointers pixelBuffer, core and world are stored in the global memory. But how will every thread access the width and height?

Is width and height stored in each local memory so in the global memory (slow) or is there a clever implicit shared-memory storage (fast on-chip memory)?

Thanks a lot!

kernel parameters are stored in shared memory for compute 1.x and constant memory for compute 2.x. It is all fast and automatic, so there is nothing to worry about.

kernel parameters are stored in shared memory for compute 1.x and constant memory for compute 2.x. It is all fast and automatic, so there is nothing to worry about.

ah perfect! that’s great - thanks a lot!

ah perfect! that’s great - thanks a lot!

A thought nags me a bit:

Since afaik constant memory cannot be properly allocated, using it explicitly in an app (for, say matrices which get used again and again by a kernel) might conflict with this use of constant memory for passing variables to kernels, even if the constant memory for the variables is swapped out and back (the kernel might rely on the original values).

So this use of constant memory seems wrong to me.

Am I missing something?

A thought nags me a bit:

Since afaik constant memory cannot be properly allocated, using it explicitly in an app (for, say matrices which get used again and again by a kernel) might conflict with this use of constant memory for passing variables to kernels, even if the constant memory for the variables is swapped out and back (the kernel might rely on the original values).

So this use of constant memory seems wrong to me.

Am I missing something?

What do you mean by “cannot be properly allocated”? Are you referring to the fact that all allocations are static and no dynamical allocations possible?

What do you mean by “cannot be properly allocated”? Are you referring to the fact that all allocations are static and no dynamical allocations possible?

That’s it. But the compiler does appear to keep track of the memory used for declared variables, so it ought to know whether constant memory can be used for parameter passing.

So probably there is no fundamental problem, as long as the compiler can fall back on other methods of parameter passing. Making compilers is just a very complicated job, I guess.

That’s it. But the compiler does appear to keep track of the memory used for declared variables, so it ought to know whether constant memory can be used for parameter passing.

So probably there is no fundamental problem, as long as the compiler can fall back on other methods of parameter passing. Making compilers is just a very complicated job, I guess.

An interesting question, though…

If shared memory is used for Compute 1.x and constant memory for Compute 2.0. So what does cudaFuncGetAttributes() supposed to report for shared memory use? How can it compute it if it doesn’t know which device it’s for yet, since cudaFuncGetAttributes is context independent?

I suspect I’ve been hit by this myself in systems with both a GTX295 and GTX480 in them… I try to give the blocks as much shared dynamic memory as possible and use cudaFuncAttributes() to determine how much is available. I suspect that cudaFuncGetAttributes may underreport shared memory use for the GTX295 in these cases, causing the kernel launch to fail due to too few resources (I have to ask for less shared mem). I should really make a test case and report it.

An interesting question, though…

If shared memory is used for Compute 1.x and constant memory for Compute 2.0. So what does cudaFuncGetAttributes() supposed to report for shared memory use? How can it compute it if it doesn’t know which device it’s for yet, since cudaFuncGetAttributes is context independent?

I suspect I’ve been hit by this myself in systems with both a GTX295 and GTX480 in them… I try to give the blocks as much shared dynamic memory as possible and use cudaFuncAttributes() to determine how much is available. I suspect that cudaFuncGetAttributes may underreport shared memory use for the GTX295 in these cases, causing the kernel launch to fail due to too few resources (I have to ask for less shared mem). I should really make a test case and report it.

Why should the compiler ever fall back to a different method for parameter passing? Constant memory consists of multiple banks of 64 KB each, and constant variables and parameters use different banks, so they cannot interfere or cause the constant memory to overflow.

Why should the compiler ever fall back to a different method for parameter passing? Constant memory consists of multiple banks of 64 KB each, and constant variables and parameters use different banks, so they cannot interfere or cause the constant memory to overflow.