Help with coalescing

zajo · March 12, 2008, 1:26am

I have a very simple kernel:

__global__

void

kernel( ushort4 * output, size_t output_pitch )

{

    unsigned int u = blockIdx.x*blockDim.x + threadIdx.x;

    unsigned int v = blockIdx.y*blockDim.y + threadIdx.y;

    ushort4 p = sample_map(u,v);

    ushort4 * out = (ushort4 *)((char *)output+v*output_pitch) + u;

    *out = p;

}

In the profiler I can see that none of the memory writes are coalesced, I am wondering why?

My block size is 8x8, and the address passed as the output pointer is obtained from cudaMallocPitch, which according to the documentation should be correct.

I’m sure I’m missing something stupid but I can’t seem to find it…

Topic		Replies	Views
Weird coalescing results in cuda 3.0 profiler CUDA Programming and Performance	0	665	July 3, 2010
Isn't that Coalesced?! writing to global memory in a coalesced way CUDA Programming and Performance	9	10196	June 28, 2009
Why coalesced loads and writes? CUDA Programming and Performance	2	1289	April 8, 2009
Kernel has 0 coalesced reads/writes... Profiler reveals my newbness CUDA Programming and Performance	1	1088	February 18, 2009
Problem with coalesced memory access CUDA Programming and Performance	2	2777	June 23, 2008
Memory coalescing CUDA Programming and Performance	0	8393	June 10, 2007
question in the sample code (simpleStream.cu) CUDA Programming and Performance	3	3815	November 26, 2007
An example of coalesced memory access CUDA Programming and Performance	2	3669	June 28, 2010
Global Memory Coalescing: Read and Write Memory Coalescing CUDA Programming and Performance	9	8230	July 31, 2007
Why the two coalesced memory write vary so much? CUDA Programming and Performance	2	3877	March 26, 2010

Help with coalescing

Related topics