sign error: minus operation between 2 types of float2

Hi, I start working on some version of FFT. My last code of radix-4 get diferent result on NVidia and Ati cards.

Problem is on my Nvidia card 460 GT. I tried this on Ati radeon HD 5450 and the result is ok.

__kernel void fft_radix4(__global const float2 * x,__global float2 * y,int p)

{

  int t = get_global_size(0); // number of threads

  int i = get_global_id(0); // current thread

  int k = i & (p-1); // index in input sequence, in 0..P-1

  // Inputs indices are I+{0,1,2,3}*T

  x += i;

  // Output indices are J+{0,1,2,3}*P, where

  // J is I with two 0 bits inserted at bit log2(P)

  y += ((i-k)<<2) + k;

// Load and twiddle inputs

  // Twiddling factors are exp(_I*PI*{0,1,2,3}*K/2P)

  float alpha = -FFT_PI*(float)k/(float)(2*p);

// Load and twiddle, one exp_alpha computed instead of 3

  float2 twiddle = exp_alpha_1(alpha);

  float2 u0 = x[0];

  float2 u1 = mul_1(twiddle,x[t]);

  float2 u2 = x[2*t];

  float2 u3 = mul_1(twiddle,x[3*t]);

  twiddle = sqr_1(twiddle);

  u2 = mul_1(twiddle,u2);

  u3 = mul_1(twiddle,u3);

// 2x DFT2 and twiddle

  float2 v0 = u0 + u2;

  float2 v1 = u0 - u2;

  float2 v2 = u1 + u3;

  float2 v3 = mul_p1q2(u1 - u3); // twiddle

// 2x DFT2 and store

  y[0] = v0 + v2;

  y[p] = v1 + v3;

  y[2*p] = v0 - v2;

  //v3.y = -v3.y; <-- doesn't change

  y[3*p] = v1 - v3; <-- error in sign

I have a sequence of 8 numbers: 0, 1, 2, 3, 4, 5, 6, 7

Result after kernel executions is:

0 +12.000000 +0.000000

1 -4.000000 +4.000000

2 -4.000000 +0.000000

3 -4.000000 +4.000000 << should be -4.000000 -4.000000 (Ati cards are ok, nvidia no)

4 +16.000000 +0.000000

5 -4.000000 +4.000000

6 -4.000000 +0.000000

7 -4.000000 +4.000000 << should be -4.000000 -4.000000 (Ati cards are ok, nvidia no)

v1 = -4 and v3 = +4i in both threads

Why v1 - v3 = -4 + 4i. It should equal -4 - 4i.

Someone tell me what’s wrong?

Hi, I start working on some version of FFT. My last code of radix-4 get diferent result on NVidia and Ati cards.

Problem is on my Nvidia card 460 GT. I tried this on Ati radeon HD 5450 and the result is ok.

__kernel void fft_radix4(__global const float2 * x,__global float2 * y,int p)

{

  int t = get_global_size(0); // number of threads

  int i = get_global_id(0); // current thread

  int k = i & (p-1); // index in input sequence, in 0..P-1

  // Inputs indices are I+{0,1,2,3}*T

  x += i;

  // Output indices are J+{0,1,2,3}*P, where

  // J is I with two 0 bits inserted at bit log2(P)

  y += ((i-k)<<2) + k;

// Load and twiddle inputs

  // Twiddling factors are exp(_I*PI*{0,1,2,3}*K/2P)

  float alpha = -FFT_PI*(float)k/(float)(2*p);

// Load and twiddle, one exp_alpha computed instead of 3

  float2 twiddle = exp_alpha_1(alpha);

  float2 u0 = x[0];

  float2 u1 = mul_1(twiddle,x[t]);

  float2 u2 = x[2*t];

  float2 u3 = mul_1(twiddle,x[3*t]);

  twiddle = sqr_1(twiddle);

  u2 = mul_1(twiddle,u2);

  u3 = mul_1(twiddle,u3);

// 2x DFT2 and twiddle

  float2 v0 = u0 + u2;

  float2 v1 = u0 - u2;

  float2 v2 = u1 + u3;

  float2 v3 = mul_p1q2(u1 - u3); // twiddle

// 2x DFT2 and store

  y[0] = v0 + v2;

  y[p] = v1 + v3;

  y[2*p] = v0 - v2;

  //v3.y = -v3.y; <-- doesn't change

  y[3*p] = v1 - v3; <-- error in sign

I have a sequence of 8 numbers: 0, 1, 2, 3, 4, 5, 6, 7

Result after kernel executions is:

0 +12.000000 +0.000000

1 -4.000000 +4.000000

2 -4.000000 +0.000000

3 -4.000000 +4.000000 << should be -4.000000 -4.000000 (Ati cards are ok, nvidia no)

4 +16.000000 +0.000000

5 -4.000000 +4.000000

6 -4.000000 +0.000000

7 -4.000000 +4.000000 << should be -4.000000 -4.000000 (Ati cards are ok, nvidia no)

v1 = -4 and v3 = +4i in both threads

Why v1 - v3 = -4 + 4i. It should equal -4 - 4i.

Someone tell me what’s wrong?

Have you find out what was wrong? It seems pretty odd.

You may try to declare local variable x and y, so that you don’t change the input kernel parameters (I’m not sure if it can cause some problems.) It is just a dam suggestion… but who knows:-)