In registor or not the parameters defined in program

babysun · July 29, 2008, 1:26am

hi,

ï¼© always confuse with where the parameters defined in a kernal stored in.

For example,there is a kernal to transpose the array a to the array b
global
void example1(float *a, float *b)
{
int tx=threadIdx.x;
int ty=threadIdx.y;
…
float c=a[position];
b[position]=c;
}

then where the c stored?
is this kernal will be slower than
global
void example2(float *a, float *b)
{
int tx=threadIdx.x;
int ty=threadIdx.y;
…
shared float c[256];
c[position]=a[position];
__syncthreads();
b[position]=c[position];
}

Thank you

AndreiB · July 29, 2008, 4:54am

Arguments you pass to kernel are stored in shared memory.
Variables you define inside kernel are usually placed in registers unless you are defining array which a) is large (from compiler’s point of view) to reside in registers or b) is accessed with index not known at compile time. In both cases variable will be placed in local memory.

babysun · July 30, 2008, 6:15am

Is that means for speed only, the registers has no difference with the shared memory, but because always registers is smaller than shared memory, we should use shared memory as much as possible?

By the way, I have another questions.

Usually, shared memory is defined in a kernal like

global

void example()

{

…

shared int shardmem[256];

…

}

I know it is right, but I confused that for every thread this part will be run, and i think it means the shardmem will be defined many times.

How does cuda avoid this situation happened?

Thank you

yk_cadcg · July 30, 2008, 10:57am

Is that means for speed only, the registers has no difference with the shared memory, but because always registers is smaller than shared memory, we should use shared memory as much as possible?
–trasferring “shared memory variables” may be slower due to bank conflict; register, by definition, is ready-to-use.
as i know, there’s no need to deliberately re-write distinct-per-thread variables into a _shared- array. If a var is shared by all threads in a block, of course it’s better to declare it as share to save regiter usage.

How does cuda avoid this situation happened?
–that’s the compiler things, don’t worry about it.

babysun · July 31, 2008, 4:52am

Thank you for your answer,it is very helpful

Haarsh · July 31, 2008, 9:23am

How to be sure that an array is stored in registers and not in GMEM ?

EDIT: Is it possible to make nvcc storing an array in registers?

H.

cbuchner1 · July 31, 2008, 9:58am

The arrays must be relatively small. Array indices must be known at compile time, then your array ends up in registers. The reason being: one can’t address registers through pointers.

Something like the following cannot go into registers and is moved to global memory, so it would be better to declare the flags array with _shared.

int flags[8];

flags[threadIdx.x & 7] = 1;

this however is likely to go into registers:

int flags[4];

flags[0] = 1;

flags[1] = 1;

flags[2] = 1;

flags[3] = 1;

Haarsh · July 31, 2008, 6:10pm

Thx,

where did you read it please? the user guide is not so accurate about it.

cbuchner1 · July 31, 2008, 8:54pm

http://www.nvidia.com/object/cuda_develop.html

Most of my knowledge was gained bvy listening through the University of Illinois Cudacast linked to from the above URL. This is actually an excellent way to get familiar with the G80/G92 chip architecture and CUDA. You get the big picture, whereas the CUDA programming manual is probably not a good way to get started at all (way too many details to confuse you…). Overall this lecture gives you a bit of an understanding of the chip architecture and the reasoning behind the design choices. But it takes about 10 hours of your time to listen through all of it.

Somewhere in one of the early lessons Prof Hwu mentions that you cannot take the address of a register. This basically means that the compiler has no means to place an array in registers, because normally (at the assembly level) arrays would be addressed through a base pointer and an offset.

Christian

Topic		Replies	Views
Memory Questions CUDA Programming and Performance	6	2397	September 4, 2009
thread private array stored in register or local memory? CUDA Programming and Performance	5	1572	June 21, 2021
Registers and Locally declared Variables Variables declared in _global_ functions CUDA Programming and Performance	4	5745	September 5, 2007
Optimizing a Kernel with a lot of variables - memory allocation. CUDA Programming and Performance	6	731	August 5, 2016
Comparing Shaders & Cuda Calls CUDA Programming and Performance	1	2730	February 27, 2009
Which kind of memories do variables in kernel reside in if not decleared Like macros and global vari CUDA Programming and Performance	4	3957	February 15, 2011
where does the parameters of kernels store? CUDA Programming and Performance	7	1228	July 31, 2009
Register saving : parameters vs constant memory CUDA Programming and Performance	3	1802	March 18, 2010
Dynamic Shared Memory allocation of more than one array CUDA Programming and Performance	4	4367	June 20, 2011
Kernel requiring large number of parameters CUDA Programming and Performance	14	8595	September 5, 2008

In registor or not the parameters defined in program

Related topics