ERROR about FFT when dealing with image's cyclic shift

Some problem occurs when I tried to put FFT into use to implement the image’s cyclic shift. With the goal of moving a image of 81928192 right for 1000, I attempted to do FFT for each row in the matrix, and then plus exp(2PI/8192y1000*i). In the end, by the way, apply IFFT for each row in the matrix and aim at approaching the result. However, the result turns out to be a different shape compared with original image.

There is the parameter for define:

#define ROW 8192
#define COLUMN 8192
#define DATA_SIZE (ROW*COLUMN) //The length and width information of processing image
#define BLOCK_NUM (DATA_SIZE/THREAD_NUM)
#define THREAD_NUM 256

code in main function:

cufftHandle plan;
cufftPlan1d(&plan, COLUMN, CUFFT_C2C, ROW);
cufftExecC2C(plan, data, temp, CUFFT_FORWARD);
Matrix_Trf<<<BLOCK_NUM, THREAD_NUM, 0>>>(temp, temp);
cufftExecC2C(plan, temp, temp, CUFFT_INVERSE);
cudaMemcpy(source_result,temp, sizeof(float2)*DATA_SIZE,cudaMemcpyDeviceToHost);

kernal:
global static void Matrix_Trf(float2 *result, float2 * source)
{

const int tid = threadIdx.x;
const int bid = blockIdx.x * THREAD_NUM;
result[bid+tid].x=source[bid+tid].x * cos(float(2PI/COLUMN((bid+tid)%COLUMN)1000)) - source[bid+tid].y * sin(float(2PI/COLUMN*((bid+tid)%COLUMN)1000));
result[bid+tid].y=source[bid+tid].x * sin(float(2
PI/COLUMN*((bid+tid)%COLUMN)1000)) + source[bid+tid].y * cos(float(2PI/COLUMN*((bid+tid)%COLUMN)*1000));
}

Since I have been working for this for too weeks, all I earn from that is depression…

Please help!! All the masters!!

I would use 2D blocks/grid.

The kernel will look something like this:

__global__ void cshift(cufftComplex *in, cufftComplex *out, int N)

{

 int tidx = threadIdx.x +  blockIdx.x * blockDim.x;

 int tidy = threadIdx.y +  blockIdx.y * blockDim.y;

 float cosv,sinv;

if ( tidx < N && tidy <N )

 {

int index=tidx*N+tidy;

 float angle=2*CUDART_PI_F/N*1000*tidx;

 sincos(angle,&sinv,&cosv);

 out[index].x = in[index].x*cosv-in[index].y*sinv;

 out[index].y = in[index].x*sinv+in[index].y*cosv;

}

You could also apply the shift in place, just save in[index] to a temporary variable.