StepSize Problem in KernelCode How can I reach every second element


I have the following code…

I want to choice my stepsize so that i can reach each second value in the vector arrays over the “opt” value.

At first i think the replacement of THREAD_N by THREAD_N*2 solves my problem but

it isn’t so… what’s my fault.

__global__ void ComplexProduct(float* vector1, float* vector2, float* result, int size){

	const int tid = blockDim.x * blockIdx.x + threadIdx.x;

	const int THREAD_N = blockDim.x * gridDim.x;

	for(int opt = tid; opt < size; opt += THREAD_N){

  result[opt]=(vector1[opt]*vector2[opt]) - (vector1[opt+1]*vector2[opt+1]);

  result[opt+1]=(vector1[opt+1]*vector2[opt])  +  (vector2[opt+1]*vector1[opt]);



Thanks for Help

my solution was:

__global__ void Muld(float2* devMemImg2, float2* devMemMulImg, long * config)


   unsigned long blockPos = threadIdx.x + IMUL(threadIdx.y, blockDim.x) + IMUL(threadIdx.z, IMUL(blockDim.x, blockDim.y));

    unsigned long gridPosX = IMUL(IMUL(IMUL(blockDim.x, blockDim.y), blockDim.z), blockIdx.x);

    unsigned long gridPosY = IMUL(IMUL(IMUL(IMUL(blockDim.x, blockDim.y), blockDim.z), gridDim.x), blockIdx.y);

    unsigned long tid = blockPos + gridPosX + gridPosY;

   float x1, x2, y1, y2;

   x1 = devMemMulImg[tid].x * devMemImg2[tid].x;

    y1 = devMemMulImg[tid].y * devMemImg2[tid].y;

   x2 = devMemMulImg[tid].x * devMemImg2[tid].y;

    y2 = devMemMulImg[tid].y * devMemImg2[tid].x;

    devMemMulImg[tid].x = x1 - y1;

    devMemMulImg[tid].y = x2 + y2;


or something like that…


but my problem is that i don’t use the float2 typ…i have only float…

or now anybody a cast solution from float* to float2*

ur data is in a row major style in memory? then u can simply float2 * x = (float2*) y; (y is float*)