Deleting columns doesn't seem to work... plz help


I am stumped…

I have a 2D matrix with 512 rows and 6400 columns, and I want to extract just some of the columns in a systematic way. I am trying to get the first 750 columns out of every 800 columns, making the final number of columns equal to 750 x 8 = 6000.

For example, I want to take columns 1-750, 801-1550, 1601-2350, 2401-3150, 3201-3950, 4001-4750, 4801-5550, 5601-6300.

Below is my kernel… it works for parts of the matrix, but the weirdest pattern showed up in the result and I can’t seen to decipher where the problem might be…

[codebox]global void extract(float* data, unsigned int windowSize)


unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;

unsigned int j = i;

unsigned int gridSize = gridDim.x * blockDim.x;

unsigned int frameSize = windowSize * blockDim.x;

unsigned int n = gridSize * XSTEP;

while (i < n)


	data[i] = data[j];

	i += gridSize;

	j += frameSize;		



The kernel call is like this

[codebox] threshold<<<750, 512>>>((float*) d_idata, 800);


The weird pattern is shown in the attached image. It’s an error plot, and the spikes are where the mismatches occur.

I used 786 instead of 750 to generate this plot.

It shows that for the first set of columns matched completely. The mismatch occurred at the end of the 2nd set of columns. But then the starting point of the 3rd frame is okay until the end of the 3rd columns, again we have mismatches. The error regions grows as you go to later sets of columns. I don’t know why it is doing that…

Any help, suggestions, comments would be greatly appreciated!!

Thanks a million,


I don’t think it’s a very good idea to write to the same data storage that you’re reading from, there are probably a few race conditions here.