This is from the programming guide, section G.3.2.2 (cuda version 3.1):
“Threads can access any words in any order, including the same words, and a single memory transaction for each segment addressed by the half-warp is issued”
This is from the programming guide, section G.3.2.2 (cuda version 3.1):
“Threads can access any words in any order, including the same words, and a single memory transaction for each segment addressed by the half-warp is issued”
firstly instead of “warp_id = tid / 32;”, it would be faster to do “warp_id = tid >> 5;”. but “warp_id=blockIdx.x” would be even better. speaking of which, if you have 32 threads per block, warp_id will always = 0 in the current code.
but anyways…
constant memory is no faster than shared memory. they are the same speed. the only difference is their scope and access. constant is shared amongst the entire gpu and is read-only. shared is shared only w/in a block and is read/write. so make decisions on which one to use where based on that. i generally use constant memory instead of shared whenever i can because i find shared memory to always be in short supply and i can’t find much to put in constant memory anyways.
firstly instead of “warp_id = tid / 32;”, it would be faster to do “warp_id = tid >> 5;”. but “warp_id=blockIdx.x” would be even better. speaking of which, if you have 32 threads per block, warp_id will always = 0 in the current code.
but anyways…
constant memory is no faster than shared memory. they are the same speed. the only difference is their scope and access. constant is shared amongst the entire gpu and is read-only. shared is shared only w/in a block and is read/write. so make decisions on which one to use where based on that. i generally use constant memory instead of shared whenever i can because i find shared memory to always be in short supply and i can’t find much to put in constant memory anyways.
Constant cache still is backed up by a region of “global mem”… No?
So, the first access will result in gmem read…
Only repeated broadcast like access will give you the benefit.
Constant cache still is backed up by a region of “global mem”… No?
So, the first access will result in gmem read…
Only repeated broadcast like access will give you the benefit.