CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

#Title# CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

Greetings. Everyone.

  1. I installed pytorch wheel files from this link provided by the Moderator:
    PyTorch for Jetson
    I am using Jetson AGX Orin 64G and JetPack 6, so I have:
    pytorch-wpe 0.0.1
    torch 2.3.0
    torch-complex 0.4.4
    torchaudio 2.3.0+952ea74
    torchvision 0.18.0a0+6043bc2

  2. Run apt-cache show nvidia-jetpack, I have:
    apt-cache show nvidia-jetpack
    Package: nvidia-jetpack
    Source: nvidia-jetpack (6.2)
    Version: 6.2+b77
    Architecture: arm64
    Maintainer: NVIDIA Corporation
    Installed-Size: 194
    Depends: nvidia-jetpack-runtime (= 6.2+b77), nvidia-jetpack-dev (= 6.2+b77)
    Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
    Priority: standard
    Section: metapackages
    Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.2+b77_arm64.deb
    Size: 29298
    SHA256: 70553d4b5a802057f9436677ef8ce255db386fd3b5d24ff2c0a8ec0e485c59cd
    SHA1: 9deab64d12eef0e788471e05856c84bf2a0cf6e6
    MD5sum: 4db65dc36434fe1f84176843384aee23
    Description: NVIDIA Jetpack Meta Package
    Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Package: nvidia-jetpack
Source: nvidia-jetpack (6.1)
Version: 6.1+b123
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.1+b123), nvidia-jetpack-dev (= 6.1+b123)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.1+b123_arm64.deb
Size: 29312
SHA256: b6475a6108aeabc5b16af7c102162b7c46c36361239fef6293535d05ee2c2929
SHA1: f0984a6272c8f3a70ae14cb2ca6716b8c1a09543
MD5sum: a167745e1d88a8d7597454c8003fa9a4
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

  1. I am trying FunASR project and my code is like:
    from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model=“paraformer-zh-streaming”)

import soundfile
import os

wav_file = os.path.join(model.model_path, “example/asr_example.wav”)
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = i == total_chunk_num - 1
res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
print(res)

You can find the source code and the project at this link: modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc. (github.com)

Run the code, and I have:
python funasr_speech_recognition_streaming.py
funasr version: 1.2.4.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
You are using the latest version of funasr-1.2.4
Downloading Model to directory: /home/nvidia/.cache/modelscope/hub/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online
2025-02-26 23:38:36,008 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File “/home/nvidia/projects/playground/funasr_speech_recognition_streaming.py”, line 21, in
res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/auto/auto_model.py”, line 303, in generate
return self.inference(input, input_len=input_len, **cfg)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/auto/auto_model.py”, line 345, in inference
res = model.inference(**batch, **kwargs)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/paraformer_streaming/model.py”, line 629, in inference
tokens_i = self.generate_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/paraformer_streaming/model.py”, line 482, in generate_chunk
encoder_out, encoder_out_lens = self.encode_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/paraformer_streaming/model.py”, line 175, in encode_chunk
encoder_out, encoder_out_lens, _ = self.encoder.forward_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/scama/encoder.py”, line 480, in forward_chunk
encoder_outs = encoder_layer.forward_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/scama/encoder.py”, line 172, in forward_chunk
x, cache = self.self_attn.forward_chunk(x, cache, chunk_size, look_back)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/sanm/attention.py”, line 327, in forward_chunk
q_h, k_h, v_h, v = self.forward_qkv(x)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/sanm/attention.py”, line 240, in forward_qkv
q_k_v = self.linear_q_k_v(x)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1541, in _call_impl
return forward_call(*args, **kwargs)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/torch/nn/modules/linear.py”, line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

After spending some time investigating this issue by searching the internet around, I got nothing useful. So, could anyone please point out what I was missing there?

Thank you very much for your help and kindness.

Hi,

Which JetPack do you use?
You have shared both 6.2+b77 and 6.1+b123 log.

Since JetPack 6.1 and 6.2 use different CUDA version (12.2 vs. 12.6).
You will need to install the PyTorch accordingly to meet the compatibility.

Thanks.

Thank you very much for your reply!
so here is the output from jetson_release command: (jetson) 11:59:01
Software part of jetson-stats 4.3.1 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.2 [L4T 36.4.3]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • P-Number: p3701-0005
  • Module: NVIDIA Jetson AGX Orin (64GB ram)
    Platform:
  • Distribution: Ubuntu 22.04 Jammy Jellyfish
  • Release: 5.15.148-tegra
    jtop:
  • Version: 4.3.1
  • Service: Inactive
    Libraries:
  • CUDA: 12.6.68
  • cuDNN: 9.3.0.75
  • TensorRT: 10.3.0.30
  • VPI: 3.2.4
  • Vulkan: 1.3.204
  • OpenCV: 4.8.0 - with CUDA: NO

after retry for serval torch versions, this issue is gone.
This thread can be closed now. Thank you everyone.

Hi,

Thanks for your feedback.

You can find some useful prebuilt packages for JetPack 6.2 in the below link:
https://pypi.jetson-ai-lab.dev/jp6/cu126

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.