Constant memory access Using banks like the shared memory?

AlfredDube · January 6, 2009, 5:12pm

If multiple threads access the same memory location in constant memory, and that location resides in the local constant memory cache, will the accesses be serialized? Like they would if multiple threads access the same shared memory location?

cbuchner1 · January 6, 2009, 5:17pm

I don`t think so. I am getting excellent performance from constant memory with all threads hammering the same location in constant cache. For my project it’s polygon edge coordinates - all screen pixels are getting tested against these. Each thread in a block tests a different screen pixel against the same edges.

So if these accesses would get serialized I wouldn’t possibly see the performance I am seeing.

Christian

MisterAnderson42 · January 6, 2009, 5:19pm

In fact, this is the ideal way to access constant memory. It is fastest when the value is broadcast to all threads in a warp.

Warps are serialized when threads in the warp read different values in the constant memory, making textures and/or shared memory potentially more attractive for this memory pattern.

AlfredDube · January 6, 2009, 6:30pm

OK… so when all reads are to the same address, it’s best to use constant memory. And if reads are to different addresses, it’s best to use shared memory (or textures).

A few more questions:

Does the texture cache have a certain number of banks, like the shared memory does?
Do the registers have similar restrictions? (like, if multiple threads read the same register, they’ll get serialized)?

The thing is that I’m gonna make semi-random read accesses and I can’t guarantee that reads will be to different banks. I’m wondering why there is such a restrictions about reading to the same address, I mean, read-after-read operation should bring no synchronization problem.

MisterAnderson42 · January 6, 2009, 6:44pm

Not that I’m aware of.

Registers are allocated per-thread, and you can’t read one thread’s registers from another.

The documentation doesn’t really say, but I would guess that there is only one constant memory read unit per warp in the hardware. Hence, when a warp accesses multiple values from constant memory, it must serialize access to that hardware unit. But that is just a guess. Consider that constant memory is probably the underlying hardware that graphics shaders use for constant parameters to the shader. In that case, every thread in the shader is reading the same parameter simultaneously, so the hardware would be optimized for this use case to give the best graphics performance.

If you are going to make semi-random read accesses and are lucky enough that your data fits in shared memory, that will probably be your best bet. But it never hurts to try out the various ways to see which is faster in your circumstance.

Topic		Replies	Views
Bank Conflicts at Constant Memory CUDA Programming and Performance	2	1243	May 8, 2012
Really slow constant memory Random access to constant memory CUDA Programming and Performance	13	4568	December 4, 2009
Warp Serialisation and Constant Memory Performance Surprise CUDA Programming and Performance	7	3960	March 3, 2009
Constant memory accesses by a threads of a half warp CUDA Programming and Performance	10	1989	October 27, 2014
General Question about Proper Use of Constants for Optimization CUDA Programming and Performance	2	924	July 8, 2010
constant vs shared memory CUDA Programming and Performance	2	23407	February 23, 2007
const vs shared speed CUDA Programming and Performance	2	4647	August 30, 2007
Multiple memory access CUDA Programming and Performance	4	1428	February 20, 2012
Constant Arrays CUDA Programming and Performance	13	30710	November 24, 2007
Constant memory 64KB only 8KB usable? CUDA Programming and Performance	13	7126	February 15, 2011

Constant memory access Using banks like the shared memory?

Related topics