Problem with RotateLeft - RotateRight functions. they work fine unless I use the output of one as in

I am having a bit of trouble with a couple of kernels. I am writing a program that involves alot of fourier transforms and I need to manipulate them to get the lower frequencies in the center instead of the corners as this makes it easier for upsampling. I was planning on doing this using a mixture of rotateright and transposing then doing it again.

The problem I am having is that each function seems to operate fine when I pass it a normal array. But if I pass the output of RotateRight into RotateLeft, which should get me back to my original result. I get cudalink experienced an internal error. This is from using CUDA within mathematica by the way…

Does anyone have any idea what could be going wrong.

__global__ void RotateRight(float * In, float * Out, int width, int height){

	int xIndex = threadIdx.x + blockIdx.x * blockDim.x;

	int yIndex = threadIdx.y + blockIdx.y * blockDim.y;

	if (xIndex >= width || yIndex >= height)

		return ;

	int Index = (yIndex * width + xIndex);

	

	int mid = floor(float(width)/2);

	

	int xIndex2(0);

	

	if (xIndex >= mid){

	xIndex2 = xIndex-mid;

	}

	else {

	xIndex2 = xIndex+mid+1;

	}

	

	int Index2 = (yIndex * width + xIndex2);

	

	Out[2*Index] = In[2*Index2];

	Out[2*Index+1] = In[2*Index2+1];

	}

	

	__global__ void RotateLeft(float * In, float * Out, int width, int height){

	int xIndex = threadIdx.x + blockIdx.x * blockDim.x;

	int yIndex = threadIdx.y + blockIdx.y * blockDim.y;

	if (xIndex >= width || yIndex >= height)

		return ;

	int Index = (yIndex * width + xIndex);

	

	int mid = floor(float(width)/2);

	

	int xIndex2(0);

	

	if (xIndex <= mid){

	xIndex2 = xIndex+mid+1;

	}

	else {

	xIndex2 = xIndex-mid;

	}

	

	int Index2 = (yIndex * width + xIndex2);

	

	Out[2*Index] = In[2*Index2];

	Out[2*Index+1] = In[2*Index2+1];

	}

EDIT: Apparently I have fixed it. I’m pretty sure I didn’t actually change anything though.

EDIT2: Nope still broken, it did work once though. It seems to work fine if I have the output set as an array I haven’t used before. It’s only when I try to store the result in an array I have already been using to save space that I have problems.