CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

noname.mark09 · February 26, 2025, 3:59pm

#Title# CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

Greetings. Everyone.

I installed pytorch wheel files from this link provided by the Moderator:
PyTorch for Jetson
I am using Jetson AGX Orin 64G and JetPack 6, so I have:
pytorch-wpe 0.0.1
torch 2.3.0
torch-complex 0.4.4
torchaudio 2.3.0+952ea74
torchvision 0.18.0a0+6043bc2
Run apt-cache show nvidia-jetpack, I have:
apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Source: nvidia-jetpack (6.2)
Version: 6.2+b77
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.2+b77), nvidia-jetpack-dev (= 6.2+b77)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.2+b77_arm64.deb
Size: 29298
SHA256: 70553d4b5a802057f9436677ef8ce255db386fd3b5d24ff2c0a8ec0e485c59cd
SHA1: 9deab64d12eef0e788471e05856c84bf2a0cf6e6
MD5sum: 4db65dc36434fe1f84176843384aee23
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Package: nvidia-jetpack
Source: nvidia-jetpack (6.1)
Version: 6.1+b123
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.1+b123), nvidia-jetpack-dev (= 6.1+b123)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.1+b123_arm64.deb
Size: 29312
SHA256: b6475a6108aeabc5b16af7c102162b7c46c36361239fef6293535d05ee2c2929
SHA1: f0984a6272c8f3a70ae14cb2ca6716b8c1a09543
MD5sum: a167745e1d88a8d7597454c8003fa9a4
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

I am trying FunASR project and my code is like:
from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model=“paraformer-zh-streaming”)

import soundfile
import os

wav_file = os.path.join(model.model_path, “example/asr_example.wav”)
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = i == total_chunk_num - 1
res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
print(res)

You can find the source code and the project at this link: modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc. (github.com)

Run the code, and I have:
python funasr_speech_recognition_streaming.py
funasr version: 1.2.4.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
You are using the latest version of funasr-1.2.4
Downloading Model to directory: /home/nvidia/.cache/modelscope/hub/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online
2025-02-26 23:38:36,008 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File “/home/nvidia/projects/playground/funasr_speech_recognition_streaming.py”, line 21, in
res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/auto/auto_model.py”, line 303, in generate
return self.inference(input, input_len=input_len, **cfg)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/auto/auto_model.py”, line 345, in inference
res = model.inference(**batch, **kwargs)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/paraformer_streaming/model.py”, line 629, in inference
tokens_i = self.generate_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/paraformer_streaming/model.py”, line 482, in generate_chunk
encoder_out, encoder_out_lens = self.encode_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/paraformer_streaming/model.py”, line 175, in encode_chunk
encoder_out, encoder_out_lens, _ = self.encoder.forward_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/scama/encoder.py”, line 480, in forward_chunk
encoder_outs = encoder_layer.forward_chunk(
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/scama/encoder.py”, line 172, in forward_chunk
x, cache = self.self_attn.forward_chunk(x, cache, chunk_size, look_back)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/sanm/attention.py”, line 327, in forward_chunk
q_h, k_h, v_h, v = self.forward_qkv(x)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/funasr/models/sanm/attention.py”, line 240, in forward_qkv
q_k_v = self.linear_q_k_v(x)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1541, in _call_impl
return forward_call(*args, **kwargs)
File “/home/nvidia/anaconda3/envs/funasr/lib/python3.10/site-packages/torch/nn/modules/linear.py”, line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

After spending some time investigating this issue by searching the internet around, I got nothing useful. So, could anyone please point out what I was missing there?

Thank you very much for your help and kindness.

AastaLLL · February 27, 2025, 3:47am

Hi,

Which JetPack do you use?
You have shared both 6.2+b77 and 6.1+b123 log.

Since JetPack 6.1 and 6.2 use different CUDA version (12.2 vs. 12.6).
You will need to install the PyTorch accordingly to meet the compatibility.

Thanks.

noname.mark09 · February 27, 2025, 3:59am

Thank you very much for your reply!
so here is the output from jetson_release command: (jetson) 11:59:01
Software part of jetson-stats 4.3.1 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.2 [L4T 36.4.3]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

P-Number: p3701-0005
Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform:
Distribution: Ubuntu 22.04 Jammy Jellyfish
Release: 5.15.148-tegra
jtop:
Version: 4.3.1
Service: Inactive
Libraries:
CUDA: 12.6.68
cuDNN: 9.3.0.75
TensorRT: 10.3.0.30
VPI: 3.2.4
Vulkan: 1.3.204
OpenCV: 4.8.0 - with CUDA: NO

noname.mark09 · March 1, 2025, 1:20am

after retry for serval torch versions, this issue is gone.
This thread can be closed now. Thank you everyone.

AastaLLL · March 3, 2025, 5:19am

Hi,

Thanks for your feedback.

You can find some useful prebuilt packages for JetPack 6.2 in the below link:
https://pypi.jetson-ai-lab.dev/jp6/cu126

Thanks.

system · March 26, 2025, 1:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Incompatible torch2.2+Cuda12.2 wheel with other python libraries for AGX Orin Jetpack6.0 Jetson AGX Orin cuda , pytorch	9	1657	May 17, 2024
YOLOv7(8,9) on Jetson Orin Nano 4GB. RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)` Jetson Orin Nano cuda	3	634	May 8, 2024
CUDA is not installed on Jetson Orin Jetson AGX Orin cuda	8	18784	August 10, 2022
CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED Jetson AGX Orin cudnn	2	61	March 3, 2025
Using VScode debugger: One or more CUDA devices cannot be used for debugging(Jetson agx Orin 64Gb developer kit) Nsight Visual Studio Code Edition cuda , ubuntu , jetson-inference	7	1859	October 25, 2024
Error RuntimeError: CUDA error: no kernel image is available for execution on the device when doing != operation on Jetson orin agx Jetson AGX Orin cuda	7	331	February 10, 2025
Cannot passthrough GPU to Kubernetes pod on the Jetson AGX Orin dev kit Jetson AGX Orin gpu , kubernetes	15	168	April 20, 2025
No NVIDIA GPU available or detected on Nvidia Jetson Orin Nano Jetson Orin Nano cuda , pytorch	16	1962	June 17, 2024
Installing Pip Wheels for CUDA 12.0 fails Jetson AGX Orin cuda	6	4155	January 25, 2023
Jetson Orin Nano Dev Board Pods Stuck in ContainersCreating State Jetson Orin Nano docker , kubernetes	7	279	July 30, 2024

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Related topics