Hi, I am trying to run the triton image with my model in the Python Backend.
I am using this image
http://nvcr.io/nvidia/tritonserver:24.06-py3-igpu
When I set the instance group to KIND_CPU it runs fine although it runs with CPU. When I set it to KIND_GPU the triton server crashes with the following error message:
I0904 21:12:33.899132 1 python_be.cc:1912] "TRITONBACKEND_ModelInstanceInitialize: my_model (GPU device 0)"
I0904 21:12:33.899318 1 python_be.cc:2050] "TRITONBACKEND_ModelInstanceFinalize: delete instance state"
E0904 21:12:33.899394 1 backend_model.cc:692] "ERROR: Failed to create instance: GPU instances not supported"
I0904 21:12:33.899425 1 python_be.cc:1891] "TRITONBACKEND_ModelFinalize: delete model state"
E0904 21:12:33.899491 1 model_lifecycle.cc:641] "failed to load 'my_model' version 1: Internal: GPU instances not supported"
Looking at the support page, it seems like GPU Tensors is not supported in JetPack 5.0. Does that mean I cannot use GPU when using pb_utils.Tensor?
import triton_python_backend_utils as pb_utils
pb_utils.Tensor <- Not supported?
I am running this on Jetson Orin Nano 8GB with Jetpack 6 so I am unsure if the information on the support page is still valid.
Just to give more context, I also tried to run the AddSubNet in PyTorch example provided in the python_backend repo and I am also seeing this error when setting instance_group [{ kind: KIND_GPU }]. Setting this configuration to KIND_CPU worked but no GPU acceleration.
Here is part of the error log triton generates:
I0910 23:40:49.047496 557 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x203eae000' with size 268435456"
I0910 23:40:49.047770 557 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0910 23:40:49.053250 557 model_lifecycle.cc:472] "loading: pytorch:1"
I0910 23:40:53.845067 557 python_be.cc:1912] "TRITONBACKEND_ModelInstanceInitialize: pytorch_0_0 (GPU device 0)"
E0910 23:40:53.846082 557 backend_model.cc:692] "ERROR: Failed to create instance: GPU instances not supported"
E0910 23:40:53.846415 557 model_lifecycle.cc:641] "failed to load 'pytorch' version 1: Internal: GPU instances not supported"
I0910 23:40:53.846516 557 model_lifecycle.cc:776] "failed to load 'pytorch'"
I0910 23:40:53.847145 557 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0910 23:40:53.847323 557 server.cc:631]
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"5.300000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0910 23:40:53.847541 557 server.cc:674]
+---------+---------+----------------------------------------------------+
| Model | Version | Status |
+---------+---------+----------------------------------------------------+
| pytorch | 1 | UNAVAILABLE: Internal: GPU instances not supported |
+---------+---------+----------------------------------------------------+
I am using the same triton image
http://nvcr.io/nvidia/tritonserver:24.06-py3-igpu
Here is the output from nvidia-smi and nvcc -V
root@4fedf3f:/app/python_backend# nvidia-smi
Tue Sep 10 23:53:40 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0 Driver Version: N/A CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Orin (nvgpu) N/A | N/A N/A | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | N/A N/A |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
root@4fedf3f:/app/python_backend# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:08:11_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
Triton is unable to enable the GPU models for the Python backend because the Python backend communicates with the GPU using non-supported IPC CUDA Driver API.
We will update this information in the document accordingly.
Thanks for the response, @AastaLLL! Could you clarify exactly what combination is not supported? Is it the combination of (python backend) x (GPU) that is not supported? Or only (python backend) x (GPU) x (Jetson-ARM device)? Or is it just tritonserver:24.06-py3-igpu?
The Triton is unable to enable the GPU models for the python backend because the python backend communicates with the GPU using a legacy IPC CUDA Driver API, which is not working on Jetson.
Any progress here?
Found myself with same error and similar setup.
My case is an ensemble model that runs on GPU and pass tensors to a python backend model.
So, opened an issue on triton server repo 2 days ago…