Seeking Assistance with ONNX Runtime CUDA/cuDNN Error in CosyVoice Inference on Jetson AGX Orin
Hello NVIDIA Community,
I’m working with the FunAudioLLM/CosyVoice project (GitHub), a multilingual voice generation model, on a Jetson AGX Orin. During zero-shot inference, I encountered a critical error related to CUDA/cuDNN execution and would appreciate guidance.
Environment Details
sudo jetson_release
Software part of jetson-stats 4.3.1 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.2 [L4T 36.4.3]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
- P-Number: p3701-0005
- Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform: - Distribution: Ubuntu 22.04 Jammy Jellyfish
- Release: 5.15.148-tegra
jtop: - Version: 4.3.1
- Service: Active
Libraries: - CUDA: 12.6.68
- cuDNN: 9.3.0.75
- TensorRT: 10.3.0.30
- VPI: 3.2.4
- Vulkan: 1.3.204
- OpenCV: 4.8.0 - with CUDA: NO
-
Model: CosyVoice (latest main branch, commit [Github: FunAudioLLM/CosyVoice: Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. (github.com)]).
Error Description
The pipeline fails during speech_tokenizer_session.run(), triggering a CUDNN_STATUS_EXECUTION_FAILED in ONNX Runtime, please refer to the error message at the end for more information; -
Key Observations: CUDA/cuDNN Error: Occurs at Conv node /conv1/Conv during ONNX model execution.
-
Has anyone encountered CUDNN_STATUS_EXECUTION_FAILED on Jetson AGX Orin with ONNX models?
-
Could this relate to kernel/workspace allocation or driver compatibility?
Thank you for your time and expertise. I’m happy to provide additional details or logs.
Best regards,
Error message:
2025-03-01 08:44:16,913 INFO skip building fst for en_normalizer …
/home/nvidia/projects/CosyVoice/cosyvoice/cli/model.py:70: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See pytorch/SECURITY.md at main · pytorch/pytorch · GitHub for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
self.llm.load_state_dict(torch.load(llm_model, map_location=self.device), strict=True)
/home/nvidia/projects/CosyVoice/cosyvoice/cli/model.py:72: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See pytorch/SECURITY.md at main · pytorch/pytorch · GitHub for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
self.flow.load_state_dict(torch.load(flow_model, map_location=self.device), strict=True)
/home/nvidia/projects/CosyVoice/cosyvoice/cli/model.py:75: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See pytorch/SECURITY.md at main · pytorch/pytorch · GitHub for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
hift_state_dict = {k.replace(‘generator.’, ‘’): v for k, v in torch.load(hift_model, map_location=self.device).items()}
[trace] cosyvoice = CosyVoice2
[trace] load_wav done
[trace] after obj = cosyvoice.inference_zero_shot
0%| | 0/1 [00:00<?, ?it/s]2025-03-01 08:44:42.160472457 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=ubuntu ; file=/home/nvidia/projects/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=455 ; expr=cudnnConvolutionForward(cudnn_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data);
2025-03-01 08:44:42.160532071 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:‘/conv1/Conv’ Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=ubuntu ; file=/home/nvidia/projects/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=455 ; expr=cudnnConvolutionForward(cudnn_handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data);
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File “/home/nvidia/projects/CosyVoice/test.py”, line 21, in
for i, j in enumerate(obj):
File “/home/nvidia/projects/CosyVoice/cosyvoice/cli/cosyvoice.py”, line 82, in inference_zero_shot
model_input = self.frontend.frontend_zero_shot(i, prompt_text, prompt_speech_16k, self.sample_rate)
File “/home/nvidia/projects/CosyVoice/cosyvoice/cli/frontend.py”, line 162, in frontend_zero_shot
speech_token, speech_token_len = self.extract_speech_token(prompt_speech_16k)
File “/home/nvidia/projects/CosyVoice/cosyvoice/cli/frontend.py”, line 95, in extract_speech_token
speech_token = self.speech_tokenizer_session.run(None,
File “/home/nvidia/anaconda3/envs/cosyvoice/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py”, line 220, in run
return self.sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:‘/conv1/Conv’ Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=ubuntu ; file=/home/nvidia/projects/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=455 ; expr=cudnnConvolutionForward(cudnn_handle, &alpha, s.x_tensor, s.x_data, s.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data);