I have an application running in a docker container that (among other things) uses Nvidia NeMo’s Clustering Diarizer. This has worked totally fine on a GCP hosted T4 instance. I was curious to try out the newer L4 GPUs and hence proceeded as usual with my installation. On the L4 instance I’m getting this:
File "nemo/collections/asr/models/clustering_diarizer.py", line 447, in diarize
self._extract_embeddings(self.subsegments_manifest_path, scale_idx, len(scales))
File "nemo/collections/asr/models/clustering_diarizer.py", line 359, in _extract_embeddings
_, embs = self._speaker_model.forward(input_signal=audio_signal, input_signal_length=audio_signal_len)
File "nemo/core/classes/common.py", line 1087, in __call__
outputs = wrapped(*args, **kwargs)
File "nemo/collections/asr/models/label_models.py", line 327, in forward
processed_signal, processed_signal_len = self.preprocessor(
File "torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "nemo/core/classes/common.py", line 1087, in __call__
outputs = wrapped(*args, **kwargs)
File "torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "nemo/collections/asr/modules/audio_preprocessing.py", line 91, in forward
processed_signal, processed_length = self.get_features(input_signal, length)
File "nemo/collections/asr/modules/audio_preprocessing.py", line 292, in get_features
return self.featurizer(input_signal, length)
File "torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "nemo/collections/asr/parts/preprocessing/features.py", line 420, in forward
x = self.stft(x)
File "nemo/collections/asr/parts/preprocessing/features.py", line 310, in <lambda>
self.stft = lambda x: torch.stft(
File "torch/functional.py", line 632, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR
Whilst on my T4 instance everything works perfectly fine.
Info about the two VM:s:
T4 Instance
- VERSION=“20.04.5 LTS (Focal Fossa)”
- Kernel: 5.15.0-1037-gcp
- nvcc --version = Cuda compilation tools, release 10.1, V10.1.243
- nvidia-smi → NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0
L4Instance
- VERSION=“20.04.6 LTS (Focal Fossa)”
- Kernel: 5.15.0-1037-gcp
- nvcc --version = Cuda compilation tools, release 10.1, V10.1.243
- nvidia-smi → NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2
They seemingly have the same CUDA versions and hence I’m clueless to why I’m getting this error, any help would be highly appreciated.