Bad cufft CUDA-11.1 on CentOS, V100

I am not able to get a minimal cufft example working on my v100 running CentOS and cuda-11.1
It works on cuda-10.2.
It works on cuda-11.0 on Ubuntu with A100’s

Please help me figure out what I missed.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 |
|-------------------------------±---------------------±---------------------+

|===============================+======================+======================|
| 0 Tesla V100-SXM2… Off | 00000000:62:00.0 Off | Off |
| N/A 35C P0 55W / 300W | 3787MiB / 32510MiB | 7% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… Off | 00000000:89:00.0 Off | Off |
| N/A 34C P0 40W / 300W | 125MiB / 32510MiB | 0% Default |
±------------------------------±---------------------±---------------------+

Linux dev-4 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.8.2003 (Core)

testatomicfft.c (1.2 KB)
rename and compile the minimal example.
(it wouldn’t let me upload a .cu)
Observe the correct results:

$ nvcc testatomicfft.cu -lcufft

$ ./a.out

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
(1040384.0, 0.0) (-0.0, 0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
(1040384.0, 0.0) (-0.0, 0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
(1040384.0, 0.0) (-0.0, 0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
(1040384.0, 0.0) (-0.0, 0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
(1040384.0, 0.0) (-0.0, 0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0) ( 0.0, -0.0)

With an FFT size of 1040384 it works.

Observe the bad results:

$ nvcc -DFAIL testatomicfft.cu -lcufft

$ ./a.out

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
(1001472.0, -0.3) (-0.0, 0.0) (-0.0, 0.0) ( 0.0, -0.0) ( 0.0, 0.0) (-0.0, -0.0) (-0.0, 0.0) (-0.0, 0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0)

( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0) ( 1.0, 0.0)
( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0) ( 0.0, 0.0)

With an FFT size of 1001472 it fails.

The source code only creates an fft plan and runs it against the same data.
Any help is appreciated.

3 Likes

Same issue here. Any help is greatly appreciated!

2 Likes

Looks like an issue with CUDA version 11.1. Running your example gave me the exact same results but on a different device (GTX 1080 to be exact)

±----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 |
|-------------------------------±---------------------±---------------------+

|===============================+======================+======================|
| 0 GeForce GTX 108… Off | 00000000:65:00.0 Off | N/A |
| 23% 28C P8 8W / 250W | 1122MiB / 11175MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

same issue!

I’ve filed an internal NVIDIA bug for this issue (3196221). The development team has confirmed the issue. I don’t have further details and cannot immediately scope the impact. It is specific to CUFFT. If you have concerns about this CUFFT issue, my advice at the moment is to revert to CUDA 10.2 or CUDA 11.0. I’ll provide more info when I can. If I have not responded to additional requests here, it means that I don’t have the information to address those questions/requests.

All,

The problem has been determined and a fix is in work.
I’ll let everyone when they can expect it.