Erroneous output data with CUFFT Is it a bug ?

Hi, I am encountering difficulties when running various configurations of problem size in CUFFT. The application is the simple example provided by Nvidia. I use the in-place version of the transform.
In the given logs, all the tested combinations are given with their diffs (no diff is good, i.e. 0.0000). Unfortunately, sometimes the result is wrong (there is some diffs).
This test over all the combinations as been run several times. When running the big test (re-running all the combinations) the errors don’t appear at the same place! :blink:
For example, the first result is Ok, the second one is not:

NX=1024 NY=2048
tmp = nan, max = 0.000000

NX=1024 NY=4096
0 0 4194304.000000 4177920.000000 16384.000000
1024 31 0.000000 -19758.222656 19758.222656
1024 55 0.000000 -22357.396484 22357.396484
tmp = nan, max = 22357.396484

Ok let’s check the context :ph34r:
For several of them, the output data differs between the executions. I give 2 logs, corresponding with the two runs of the entire problem size combinations on:

  • the same machine
  • same card (GTX295 / dual GPU)
  • same driver (185.12)
  • same SDK (cuda 2.1)
  • same application (a.out)
  • same input data (the matrix is initialized with 1.0)

Please note that the problem is also encountered on another systems (with the same overall non-deterministic behaviour) like:

  • card: Tesla C1060
  • driver: 180.22
  • SDK: cuda 2.1
    This reminds me that the problem may be somewhere between keyboard and chair… :whistling:

I’m runnnig on Linux. I give a bash script for testing.
Please rename into, this is needed because of the “Upload failed. You are not permitted to upload this type of file” message on the forum). The same way, rename Makefile.txt, etc.
The problem appears on big problem sizes (1024+).
It is like some blocks where not computed. (4 of them I think)
Any idea ? :rolleyes:
Makefile.txt (313 Bytes) (3.52 KB) (598 Bytes)
vigg_2.1_185.12_3.txt (12.4 KB)
vigg_2.1_185.12_2.txt (12.5 KB)

Works fine for me on C1060, 185.12 and CUDA 2.1 or 2.2 beta, Linux 64bit.

I slightly modified your code to get rid of cutil ( it is evil, do not use it) and add the right include (cuda_runtime.h), used gcc to build, files attached.

It also runs fine on a S1070 with 177.70 and CUDA 2.0.

Could you try my version? (359 Bytes)
fftbug.c.txt (3.48 KB)

Thanks for the files, but it has the same behaviour.

Actually it seems to run well on certain cards, but not others (one c1060 (device 0) of an half s1070, while the device 1 is ok, and the second half also ok).

EDIT: We then suspected a hardware problem but a reboot of the s1070 seems to get rid of the problem.

EDIT: Actually, all the incriminated cards are Ok now after a reboot.

EDIT: The instability is probably obtained with the previous execution of a kernel which makes erroneous mem accesses. Then, the cards remains in an incoherent state.

Is it frequent that the cards become unstable ? I carry on my research.

Thank you.