Weird coalescing results in cuda 3.0 profiler

war_head · July 3, 2010, 2:07pm

I’m trying to write up a simple program to read pixels from an image and I decided to use a char4 type to keep with the 4byte alignment criteria for memory coalescing.

Image i size is 640x480, 1 byte per pixel ( grayscale).

Allocated memory using cudaMallocPitch returns a pitch value of 768.

In the kernel, I simply copy from a char4 source to a char4 destination. Both are allocated to the same size.

[codebox]int offset = threadIdx.x + blockIdx.y * pitch;

output[ offset ] = input[ offset ];[/codebox]

BlockDim = 160 x 1 ( basically 1 block reads 1 row of the image )

GridDim = 1 x 480

While theoretically this should allow for coalesced reads within each thread block (which is what cudaMallocPitch is supposed to guarantee), the profiler actually reports this isn’t the case! Note: For some weird reason my profiler reports the number of uncoalsced reads, but not the number of coalesced reads ( which is always zero ).

If I set gridDim to 1 x 3, the profiler reports no uncoalsced reads/writes

If I set gridDim to 1 x 4, profiler says there’s 640 uncoalsced reads and 1280 uncoalsced writes. Another strange thing is how can there be more writes than reads when it’s doing a 1:1 copy?

PS. cuda device is sm1.1

Topic		Replies	Views
Help with coalescing CUDA Programming and Performance	0	2719	March 12, 2008
Kernel has 0 coalesced reads/writes... Profiler reveals my newbness CUDA Programming and Performance	1	1088	February 18, 2009
Cuda Profiler 1.1 - question on gst coalesced value CUDA Programming and Performance	1	1594	April 5, 2009
Uncoalesced reads; Coalesced writes Same access pattern; differenct coalesced I/O outcome? CUDA Programming and Performance	5	3244	December 12, 2011
why is it uncoalesced ? SDK example simpleGL CUDA Programming and Performance	9	13688	February 3, 2011
Why coalesced loads and writes? CUDA Programming and Performance	2	1289	April 8, 2009
cuda profiler error about coalesced store CUDA Programming and Performance	2	1097	January 6, 2010
Strange CUDA profiler results CUDA Programming and Performance	2	2388	February 17, 2008
Help me about coalescing my program run too slow CUDA Programming and Performance	5	2939	May 14, 2008
Problem with coalesced memory access CUDA Programming and Performance	2	2777	June 23, 2008

Weird coalescing results in cuda 3.0 profiler

Related topics