ONNX fails to load

Description

Running Triton quickstart:
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/getting_started/quickstart.html

on a 16GB Jetson Orin with docker command:
docker run --gpus all --runtime=nvidia --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/nvidia/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models

Receive:
------------------------------------------------------------------------------------------------------------------------+
| densenet_onnx | 1 | UNAVAILABLE: Internal: onnx runtime error 6: Exception during initialization: /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /workspace/onnxruntime/onnxruntime/co |
| | | re/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=a43d498cbac4 ; file=/workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_executio |
| | | n_provider.cc ; line=168 ; expr=cublasCreate(&cublas_handle_);

Environment

TensorRT Version: 8.6.2.3
GPU Type:
Nvidia Driver Version:
CUDA Version: 12.2.140
CUDNN Version: 8.9.4.25
Operating System + Version: Jetpack 6 L4T 36.3.0
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @user62193 ,
Request you to reach out to Jetson or Triton Forum for better assistance on the topic.

Thanks

Thanks. Actually just managed to solve this by using an igpu image!

docker run --gpus=1 --runtime=nvidia --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/nvidia/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:24.05-py3-igpu tritonserver --model-repository=/models --backend-config=tensorrt,–version-compatible=true