In registor or not the parameters defined in program

hi,

I always confuse with where the parameters defined in a kernal stored in.

For example,there is a kernal to transpose the array a to the array b
global
void example1(float *a, float *b)
{
int tx=threadIdx.x;
int ty=threadIdx.y;

float c=a[position];
b[position]=c;
}

then where the c stored?
is this kernal will be slower than
global
void example2(float *a, float *b)
{
int tx=threadIdx.x;
int ty=threadIdx.y;

shared float c[256];
c[position]=a[position];
__syncthreads();
b[position]=c[position];
}

Thank you

Arguments you pass to kernel are stored in shared memory.
Variables you define inside kernel are usually placed in registers unless you are defining array which a) is large (from compiler’s point of view) to reside in registers or b) is accessed with index not known at compile time. In both cases variable will be placed in local memory.

Is that means for speed only, the registers has no difference with the shared memory, but because always registers is smaller than shared memory, we should use shared memory as much as possible?

By the way, I have another questions.

Usually, shared memory is defined in a kernal like

global

void example()

{

shared int shardmem[256];

}

I know it is right, but I confused that for every thread this part will be run, and i think it means the shardmem will be defined many times.

How does cuda avoid this situation happened?

Thank you

Is that means for speed only, the registers has no difference with the shared memory, but because always registers is smaller than shared memory, we should use shared memory as much as possible?
–trasferring “shared memory variables” may be slower due to bank conflict; register, by definition, is ready-to-use.
as i know, there’s no need to deliberately re-write distinct-per-thread variables into a _shared- array. If a var is shared by all threads in a block, of course it’s better to declare it as share to save regiter usage.

How does cuda avoid this situation happened?
–that’s the compiler things, don’t worry about it.

Thank you for your answer,it is very helpful

How to be sure that an array is stored in registers and not in GMEM ?

EDIT: Is it possible to make nvcc storing an array in registers?

H.

The arrays must be relatively small. Array indices must be known at compile time, then your array ends up in registers. The reason being: one can’t address registers through pointers.

Something like the following cannot go into registers and is moved to global memory, so it would be better to declare the flags array with _shared.

int flags[8];

flags[threadIdx.x & 7] = 1;

this however is likely to go into registers:

int flags[4];

flags[0] = 1;

flags[1] = 1;

flags[2] = 1;

flags[3] = 1;

Thx,

where did you read it please? the user guide is not so accurate about it.

http://www.nvidia.com/object/cuda_develop.html

Most of my knowledge was gained bvy listening through the University of Illinois Cudacast linked to from the above URL. This is actually an excellent way to get familiar with the G80/G92 chip architecture and CUDA. You get the big picture, whereas the CUDA programming manual is probably not a good way to get started at all (way too many details to confuse you…). Overall this lecture gives you a bit of an understanding of the chip architecture and the reasoning behind the design choices. But it takes about 10 hours of your time to listen through all of it.

Somewhere in one of the early lessons Prof Hwu mentions that you cannot take the address of a register. This basically means that the compiler has no means to place an array in registers, because normally (at the assembly level) arrays would be addressed through a base pointer and an offset.

Christian