CuFFT not working on L4 card but working on T4?

oscar.johansson · July 14, 2023, 1:41pm

I have an application running in a docker container that (among other things) uses Nvidia NeMo’s Clustering Diarizer. This has worked totally fine on a GCP hosted T4 instance. I was curious to try out the newer L4 GPUs and hence proceeded as usual with my installation. On the L4 instance I’m getting this:

 File "nemo/collections/asr/models/clustering_diarizer.py", line 447, in diarize
    self._extract_embeddings(self.subsegments_manifest_path, scale_idx, len(scales))
  File "nemo/collections/asr/models/clustering_diarizer.py", line 359, in _extract_embeddings
    _, embs = self._speaker_model.forward(input_signal=audio_signal, input_signal_length=audio_signal_len)
  File "nemo/core/classes/common.py", line 1087, in __call__
    outputs = wrapped(*args, **kwargs)
  File "nemo/collections/asr/models/label_models.py", line 327, in forward
    processed_signal, processed_signal_len = self.preprocessor(
  File "torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "nemo/core/classes/common.py", line 1087, in __call__
    outputs = wrapped(*args, **kwargs)
  File "torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "nemo/collections/asr/modules/audio_preprocessing.py", line 91, in forward
    processed_signal, processed_length = self.get_features(input_signal, length)
  File "nemo/collections/asr/modules/audio_preprocessing.py", line 292, in get_features
    return self.featurizer(input_signal, length)
  File "torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "nemo/collections/asr/parts/preprocessing/features.py", line 420, in forward
    x = self.stft(x)
  File "nemo/collections/asr/parts/preprocessing/features.py", line 310, in <lambda>
    self.stft = lambda x: torch.stft(
  File "torch/functional.py", line 632, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

Whilst on my T4 instance everything works perfectly fine.

Info about the two VM:s:

T4 Instance

VERSION=“20.04.5 LTS (Focal Fossa)”
Kernel: 5.15.0-1037-gcp
nvcc --version = Cuda compilation tools, release 10.1, V10.1.243
nvidia-smi → NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0

L4Instance

VERSION=“20.04.6 LTS (Focal Fossa)”
Kernel: 5.15.0-1037-gcp
nvcc --version = Cuda compilation tools, release 10.1, V10.1.243
nvidia-smi → NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2

They seemingly have the same CUDA versions and hence I’m clueless to why I’m getting this error, any help would be highly appreciated.

rs277 · July 14, 2023, 6:57pm

It could be because your version of cuFFT (if it came with the Cuda Toolkit), is too old.

The L4 is an Ada Lovelace Compute capability 8.9 card, which Cuda 10.1 does not support. CC8.9 was not supported until 11.8.

Topic		Replies	Views
Bug: Ubuntu on WSL2 - RTX4090 related cuFFT runtime error CUDA on Windows Subsystem for Linux cuda , wsl	12	6721	February 8, 2023
FFT Quadro vs. Tesla Cannot create FFT plan on Tesla CUDA Programming and Performance	0	14726	July 30, 2009
sp1D.cu cuFFT kernel error CUDA Programming and Performance	2	1643	September 2, 2010
`CUFFT_INTERNAL_ERROR` when using `cufftPlan` with 1d or 2d in any size GPU-Accelerated Libraries cufft	6	413	July 14, 2024
Where can I find cuFFT Link-Time Optimized Kernels example which are not related to EA library GPU-Accelerated Libraries cufft	4	174	October 8, 2024
cuFFT & cuda-x86 Legacy PGI Compilers	1	2060	March 22, 2012
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR GPU-Accelerated Libraries cudnn	1	364	October 9, 2024
CUFFT_INTERNAL_ERROR while running cufftPlan1d GPU-Accelerated Libraries	9	9446	January 5, 2021
Can we run TensorRT 3.0 on L4T 24.1 ? TensorRT	2	734	October 12, 2021
Bad cufft CUDA-11.1 on CentOS, V100 CUDA Programming and Performance	5	707	December 5, 2020

CuFFT not working on L4 card but working on T4?

Related topics