Global memory broadcasting?

StickGuy · October 1, 2008, 1:11pm

What happens if every thread in a warp reads from the same address in global memory? Is this read in one memory transaction and broadcast to all threads or is this counted as non-coalesced, requiring 32 separate memory transactions? I have been using constant memory for this access pattern, but I am running out of it.

Reimar · October 1, 2008, 1:35pm

Unless you are using a 2xx class card, there will be one read (i.e. uncoalesced) for each thread. Or put differently: don’t even think about doing it like that.

You could use tex1Dfetch, but IMO performance is not really good and I think textures are likely to cause pain if you use threads (though I think CUDA + multiple threads probably always is).

Just load it into shared memory once (either once per block + syncthreads or once per warp) and use it from there.

On 2xx hardware, it probably has similar (maybe somewhat better) performance to the “load into shared memory once per warp” approach, but if you are memory-bandwidth bound you can still be faster by caching the value manually into shared memory.

sleon · October 2, 2008, 9:25am

Can you show, how you are using it?

How do you put Data into constant memory?

Linny · October 2, 2008, 9:31am

From the Programming Guide:

__constant__ float constData[256];

float data[256];

cudaMemcpyToSymbol(constData, data, sizeof(data));

sleon · October 2, 2008, 10:38am

thank you

Topic		Replies	Views
Dependent global memory reads CUDA Programming and Performance	2	2759	October 22, 2008
Single address coalescing CUDA Programming and Performance	2	9503	January 29, 2011
1 coalesced global memory load = 16 loads? CUDA Programming and Performance	0	903	January 23, 2011
global mem reads coalesced per block or warp? CUDA Programming and Performance	5	5495	March 6, 2007
Global memory latency ... and shared memory as a cache CUDA Programming and Performance	1	8348	February 17, 2008
Reading from global memory to registers in a fast way CUDA Programming and Performance	10	1958	November 15, 2021
read the same position in global mem CUDA Programming and Performance	4	5287	November 6, 2007
Accessing same global memory address within warps CUDA Programming and Performance	4	4036	October 24, 2018
concurrent memory access CUDA Programming and Performance	1	2111	September 17, 2009
Global memory broadcast CUDA Programming and Performance	2	9050	July 4, 2011

Global memory broadcasting?

Related topics