Local vs Global memory is local memory access always coalesced ?

nitin.life · June 28, 2009, 10:46am

I have matrix like this

double mat[42][13];

per thread:

Currently am making it on per-thread basis hence its getting stored in local memory. I did that after reading in the programing guide that the local memory access is always coalesced :) (page number 86 of 2.2 programming guide).

Another way I can so this is by allocating the total space first on gpu memory (from the host side) for all the threads and then using that allocated memory. Then I would have to deal with the issue of malloc and writing to it in coalesced way ( which can be done ) .

My questions is that I saw some threads on this forum where people who are much more experienced and smarter than me in CUDA often say to avoid local memory usage as much as possible ?

Hence I am not sure now which is the best way to proceed?

The programming guide doesn’t state much on local-memory to make it clear for the programmers…

Thanks all for ur time…

PS: I have already used my shared memory hence I cant use that anymore for the above matrix :(

Nico · June 28, 2009, 1:03pm

The problem with local memory is that it is not cached, so accesses to local memory are as expensive as accesses to global memory.
Furthermore, local memory is not shared between threads in a block like shared memory is, so you can look at local memory as a very slow register.

N.

nitin.life · June 28, 2009, 4:55pm

Yes I Understand thanks. But its better than dealing with global memory… as I don’t need inter-thread data communication anyway… each thread has individual matrix which gets formed during computation and is used up for updating another global memory variable… so its scope is per thread only.

I think I will go with local memory only let see how it works as there are loooot of flops… in the kernel.

Thanks for your inputs NICO :)

Sarnath · June 30, 2009, 10:57am

The prog.guide 2.2 says they are always coalesced.

nitin.life · June 30, 2009, 11:02am

Ya I read that too… good to know that :)

thanks at ton SARNATH

Topic		Replies	Views
Question about local memory CUDA Programming and Performance	4	2241	January 15, 2010
Local memory usage CUDA Programming and Performance	5	1196	July 9, 2010
Coalesced Access to Global Memory CUDA Programming and Performance	2	1864	April 13, 2012
Local memory? CUDA Programming and Performance	6	5073	April 23, 2007
Local Memory and Global Memory It is about the speed between local memory and global memory CUDA Programming and Performance	1	1030	February 7, 2012
Local memory performance Using more than 4kb kills it.. why? CUDA Programming and Performance	24	5074	September 6, 2008
2 questions.. shared memory algorithm / Local memory limitations ? limited local memory per thread ? CUDA Programming and Performance	8	2405	August 17, 2010
Local memory size CUDA Programming and Performance	8	7630	November 14, 2008
Coalescing of local arrays CUDA Programming and Performance	0	864	June 10, 2009
Local memory layout and 32-bit words CUDA Programming and Performance cuda	4	1255	March 9, 2022

Local vs Global memory is local memory access always coalesced ?

Related topics