Bug: Ubuntu on WSL2 - RTX4090 related cuFFT runtime error

jerkadar · October 14, 2022, 9:09pm

Host System: Windows 10 version 21H2
Nvidia Driver on Host system: 522.25 Studio Version

Videocard: Geforce RTX 4090

CUDA Toolkit in WSL2: cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
Pytorch versions tested: Latest (stable - 1.12.1) for CUDA 11.6 , Nightly for CUDA11.7
Python version: 3.8.10
WSL2 Guest: Ubuntu 20.04 LTS
WSL2 Guest Kernel Version: 5.10.102.1-microsoft-standard-WSL2

Affected CUDA component: cuFFT

I’m executing the VITS model training code of GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production without any code edits. Dev branch, commit dae79b0acd3cd316016078c40a1cc553ffb9405e

This worked flawlessly up until the point when I swapped my videocard from a Geforce RTX 3090 to a 4090 yesterday.

Now I am running into a bug(?) in cuFFT:

/home/localuser/coquiTTS/.VENV/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
 ! Run is removed from DATASETS/CodexNarrator/output/CodexNarrator_vits-October-14-2022_10+50PM-dae79b0a
Traceback (most recent call last):
  File "/home/localuser/coquiTTS/.VENV/lib/python3.8/site-packages/trainer/trainer.py", line 1533, in fit
    self._fit()
  File "/home/localuser/coquiTTS/.VENV/lib/python3.8/site-packages/trainer/trainer.py", line 1517, in _fit
    self.train_epoch()
  File "/home/localuser/coquiTTS/.VENV/lib/python3.8/site-packages/trainer/trainer.py", line 1282, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/home/localuser/coquiTTS/.VENV/lib/python3.8/site-packages/trainer/trainer.py", line 1099, in train_step
    batch = self.format_batch(batch)
  File "/home/localuser/coquiTTS/.VENV/lib/python3.8/site-packages/trainer/trainer.py", line 910, in format_batch
    batch = self.model.format_batch_on_device(batch)
  File "/home/localuser/coquiTTS/TTS/TTS/tts/models/vits.py", line 1505, in format_batch_on_device
    batch["spec"] = wav_to_spec(wav, ac.fft_size, ac.hop_length, ac.win_length, center=False)
  File "/home/localuser/coquiTTS/TTS/TTS/tts/models/vits.py", line 123, in wav_to_spec
    spec = torch.stft(
  File "/home/localuser/coquiTTS/.VENV/lib/python3.8/site-packages/torch/functional.py", line 632, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

cuFFT throws this runtime error no matter what I try - I’ve tried disabling mixed precision training mode but that had no impact on it.

Crucially, this only affects the spectrogram generation step of the training module, as cuFFT is getting involved at only this step.
Inference on a VITS model executes just fine (and I’m loving the speed bump that the 4090 brings!).

I am unsure if this is strictly a problem between Pytorch and Cuda, with Pytorch needing updating, or if Cuda itself is the culprit here - but it is definitely related to the GPU upgrade as the code is identical.

Is this a (known) bug and/or is there a workaround?

coder42 · October 27, 2022, 8:18am

I can confirm this problem on a plain Linux using the code example supplied here:

coder42 · October 27, 2022, 8:27am

For the sake of completeness, here the reproducer:

#include <cuda.h>
#include <cuda_runtime.h>
#include <cuda_runtime_api.h>
#include<cuda_device_runtime_api.h>
#include <cufft.h>

#ifdef _CUFFT_H_
    static const char *cufftGetErrorString( cufftResult cufft_error_type ) {
        switch( cufft_error_type ) {
            case CUFFT_SUCCESS:
                return "CUFFT_SUCCESS: The CUFFT operation was performed";
            case CUFFT_INVALID_PLAN:
                return "CUFFT_INVALID_PLAN: The CUFFT plan to execute is invalid";
            case CUFFT_ALLOC_FAILED:
                return "CUFFT_ALLOC_FAILED: The allocation of data for CUFFT in memory failed";
            case CUFFT_INVALID_TYPE:
                return "CUFFT_INVALID_TYPE: The data type used by CUFFT is invalid";
            case CUFFT_INVALID_VALUE:
                return "CUFFT_INVALID_VALUE: The data value used by CUFFT is invalid";
            case CUFFT_INTERNAL_ERROR:
                return "CUFFT_INTERNAL_ERROR: An internal error occurred in CUFFT";
            case CUFFT_EXEC_FAILED:
                return "CUFFT_EXEC_FAILED: The execution of a plan by CUFFT failed";
            case CUFFT_SETUP_FAILED:
                return "CUFFT_SETUP_FAILED: The setup of CUFFT failed";
            case CUFFT_INVALID_SIZE:
                return "CUFFT_INVALID_SIZE: The size of the data to be used by CUFFT is invalid";
            case CUFFT_UNALIGNED_DATA:
                return "CUFFT_UNALIGNED_DATA: The data to be used by CUFFT is unaligned in memory";
        }
        return "Unknown CUFFT Error";
    }
#endif
#define BATCH 1


int main(int argc, char** argv) {
    unsigned long int data_block_length = 50397139;
    cufftResult cufft_result;
    cufftHandle plan;
    cufft_result = cufftPlan1d(&plan, data_block_length, CUFFT_Z2Z, BATCH );

    if( cufft_result != CUFFT_SUCCESS ) {
       printf( "CUFFT Error (%s)\n", cufftGetErrorString( cufft_result ) );
       exit(-1);
    }
    return 0;
}

compiles with g++ thefile.cpp -lcufft and result is:

./a.out
CUFFT Error (CUFFT_INTERNAL_ERROR: An internal error occurred in CUFFT)

coder42 · October 28, 2022, 6:59am

It seems like @Robert_Crovella might have an idea already, because I just saw, that he replied on that very stackexchange question. Would be great to get his thoughts on this. :)

Edit: (I’m “out of replies” - apparently that’s a thing now… :D)
@Robert_Crovella Thanks a lot for your feedback! I was on CUDA 11.7 and yes, with 11.8 it does indeed work on Linux.
Therefore I tested Windows 10. A version compiled with CUDA 9.2 worked without problems (I could not yet get my hands on a version compiled with 11.7, but I had one with 11.4 and that also worked).
So I tried CUDA 11.4 on Linux and - lo and behold - it works as well. Seems to really be a bug specific to the 11.7 toolkit (at least on Linux).

Robert_Crovella · October 28, 2022, 2:11pm

The problem on the SO question was basically an out-of-memory error. For the original posting in this thread, I doubt it is an out-of-memory issue. If it were me, and I had a 4090 to test on, I definitely would not use anything other than CUDA 11.8 or newer.

jerkadar · October 28, 2022, 3:12pm

What I can say now is that the same code from my original post executes on Windows 10 using CUDA 11.8 on my 4090 without errors.
So it seems to be a bug just for the WSL/Linux version of CUDA 11.8 , in conjunction with the RTX40 series – that’s all I got in terms of additional insights so far. 🙂
I’m sure one of your engineers can figure it out eventually.

Robert_Crovella · October 28, 2022, 3:24pm

If the pytorch is compiled to use CUDA 11.6 or CUDA 11.7, I doubt it is using CUDA 11.8. That typically doesn’t work. The pythonic pytorch installs that I am familiar with on linux bring their own CUDA libraries for this reason. I can’t tell how it was installed here.

Those CUDA 11.6/11.7 CUFFT libraries may not work correctly with 4090. That was the reason for my comment. NVIDIA recommends CUDA 11.8 minimum for use with RTX 40 series GPUs, and its often the case that it takes a while for DL framework “providers” to catch up with these needs and provide a new version that is linked against CUDA 11.8 (in this case) and provides CUDA 11.8 libraries. Or you can build your own pytorch.

Using a pytorch set up in an NGC container may be another option. The WSL documentation explains how to launch NGC containers.

To test the theory of a basic CUDA 11.8 CUFFT bug in WSL, I would run a test like what was already suggested - a pure CUFFT code linked against CUDA 11.8.

jerkadar · October 28, 2022, 5:07pm

Aaah. A lot of great ideas, thankyou! I’ll look into it!

user61941 · January 12, 2023, 9:41am

I am also experiencing the same bug. did you solve it?

pranavbball · February 5, 2023, 1:00am

I’m still experiencing this issue, i tried CUDA 11.7 and 11.8 but neither worked. Anyone get it working on RTX4090 on Ubuntu?

apoorvagni · February 8, 2023, 4:56pm

I am on PyTorch (v1.13.1) compiled with CUDA 11.7. Facing the exact same issue on my RTX 4090.

coder42 · February 8, 2023, 5:00pm

As far as I understood, 11.6 and 11.7 are known to be broken with RTX 4090. I can reproduce the error on 11.7.

Yet, 11.4 or 11.8 do work fine for me, so I am surprised, @pranavbball has this problems with 11.8 as well.

pranavbball · February 8, 2023, 5:04pm

I solved the problem as i updated my pytorch and torchaudio versions to 2.0+ nightly.

Topic		Replies	Views
Failure to install CUDA on WSL 2 Ubuntu CUDA on Windows Subsystem for Linux	65	46462	September 10, 2021
my speedy FFT 3x faster than CUFFT CUDA Programming and Performance	139	241055	November 16, 2011
CUDA sample throwing error CUDA on Windows Subsystem for Linux	46	22924	April 29, 2022
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430098	March 25, 2010
`CUFFT_INTERNAL_ERROR` when using `cufftPlan` with 1d or 2d in any size GPU-Accelerated Libraries cufft	6	255	July 14, 2024
Quad (4x) A6000 WSL2 CUDA Init Errors CUDA on Windows Subsystem for Linux	11	2784	November 29, 2024
Training WSL 2 CUDA hangs over several training steps cuDNN	14	4366	October 7, 2021
CUDA 2.1 discussion CUDA Programming and Performance	71	63941	February 17, 2009
New CUDA on WSL2 WIP driver 465.21 is now available for download! CUDA on Windows Subsystem for Linux	8	3956	December 31, 2020
CUFFT_INVALID_DEVICE on cufftPlan1d in NVIDIA's Simple CUFFT example GPU-Accelerated Libraries	6	3803	December 15, 2014

Bug: Ubuntu on WSL2 - RTX4090 related cuFFT runtime error

Related topics