Coalescing with no performance improvement

_DK · May 13, 2009, 8:02pm

Hi,

I was modifying marching cubes example for my purposes and came across a strange behaviour.
I’m writing my results to a volume array (binded to a 1d texture later used for fetching to generate the mesh). In the first scenario each thread was writing out uchar (which resulted in uncoalesced memory store, since g80 doesn’t support coalescing of less than 32 bit data types). I’ve changed it to float to enable coalescing (which was proved by the profiler):
External Media

We’ve got coalescing, but no performance improvement:
External Media

Does this mean that I’m sort of ALU bound in this case? And in the first case uncoalesced memory access latency was hidden by the computationally expensive kernel (the actual calculation of the value that needs to be stored in the global memory)?

What could be the way to improve performance in this case?

Thanks!

Smokey · May 13, 2009, 10:47pm

You’re either computation bound, you’re using a lot of texture lookups, or crippled by warp serialization / divergent threads.

_DK · May 14, 2009, 6:59pm

Sorry, I’m not sure I got your point regarding texture lookups. In the kernel that now has only coalesced store operations I wasn’t doing any fetches (only in the kernels being ran afterwards). And this kernel still runs at the same speed.

What’s the way to find out if I have problems with warp serialization or am really comp. bound?

Thanks!

Topic		Replies	Views
Help with kernel CUDA Programming and Performance	6	1582	April 23, 2010
coalesced vs. uncoalesced access why not speed-up of 16x? CUDA Programming and Performance	13	6007	October 29, 2008
Coalescing CUDA Programming and Performance	17	5951	October 25, 2010
Coalescing - beginner question CUDA Programming and Performance	10	1775	June 23, 2010
Coalescing the Global memory load/store not giving any speed-up CUDA Programming and Performance	2	5157	March 7, 2009
How to resolve this Coalescing problem? CUDA Programming and Performance	11	2184	May 28, 2009
Help Avoiding Un-Coalesced Memory Access CUDA Programming and Performance	9	9214	October 4, 2010
coalescing future.. CUDA Programming and Performance	6	2698	April 7, 2008
Help me about coalescing my program run too slow CUDA Programming and Performance	5	2932	May 14, 2008
Memory Coalescing CUDA Programming and Performance	5	9278	October 15, 2011

Coalescing with no performance improvement

Related topics