GPU support with Triton iGPU image and Python Backend

honeytung · September 5, 2024, 8:00pm

Hi, I am trying to run the triton image with my model in the Python Backend.

I am using this image

http://nvcr.io/nvidia/tritonserver:24.06-py3-igpu

When I set the instance group to KIND_CPU it runs fine although it runs with CPU. When I set it to KIND_GPU the triton server crashes with the following error message:

I0904 21:12:33.899132 1 python_be.cc:1912] "TRITONBACKEND_ModelInstanceInitialize: my_model (GPU device 0)"
I0904 21:12:33.899318 1 python_be.cc:2050] "TRITONBACKEND_ModelInstanceFinalize: delete instance state"
E0904 21:12:33.899394 1 backend_model.cc:692] "ERROR: Failed to create instance: GPU instances not supported"
I0904 21:12:33.899425 1 python_be.cc:1891] "TRITONBACKEND_ModelFinalize: delete model state"
E0904 21:12:33.899491 1 model_lifecycle.cc:641] "failed to load 'my_model' version 1: Internal: GPU instances not supported"

Looking at the support page, it seems like GPU Tensors is not supported in JetPack 5.0. Does that mean I cannot use GPU when using pb_utils.Tensor?

import triton_python_backend_utils as pb_utils
pb_utils.Tensor <- Not supported?

I am running this on Jetson Orin Nano 8GB with Jetpack 6 so I am unsure if the information on the support page is still valid.

Thank you!

AastaLLL · September 6, 2024, 2:48am

Hi,

Thanks for your report.
We need to check with our internal team for the latest status and provide more info to you later.

Thanks.

honeytung · September 6, 2024, 10:39pm

Hi AastaLLL,

Thank you for your help! Do you receive any updates from your internal team on this matter?

Thanks!

AastaLLL · September 9, 2024, 5:56am

Hi,

Not yet.
Will let you know once we got a feedback.

Thanks.

honeytung · September 10, 2024, 11:54pm

Hi AastaLLL,

I appreciate your help on this matter!

Just to give more context, I also tried to run the AddSubNet in PyTorch example provided in the python_backend repo and I am also seeing this error when setting instance_group [{ kind: KIND_GPU }]. Setting this configuration to KIND_CPU worked but no GPU acceleration.

Here is part of the error log triton generates:

I0910 23:40:49.047496 557 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x203eae000' with size 268435456"
I0910 23:40:49.047770 557 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0910 23:40:49.053250 557 model_lifecycle.cc:472] "loading: pytorch:1"
I0910 23:40:53.845067 557 python_be.cc:1912] "TRITONBACKEND_ModelInstanceInitialize: pytorch_0_0 (GPU device 0)"
E0910 23:40:53.846082 557 backend_model.cc:692] "ERROR: Failed to create instance: GPU instances not supported"
E0910 23:40:53.846415 557 model_lifecycle.cc:641] "failed to load 'pytorch' version 1: Internal: GPU instances not supported"
I0910 23:40:53.846516 557 model_lifecycle.cc:776] "failed to load 'pytorch'"
I0910 23:40:53.847145 557 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0910 23:40:53.847323 557 server.cc:631]
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                                                                                                        |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"5.300000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0910 23:40:53.847541 557 server.cc:674]
+---------+---------+----------------------------------------------------+
| Model   | Version | Status                                             |
+---------+---------+----------------------------------------------------+
| pytorch | 1       | UNAVAILABLE: Internal: GPU instances not supported |
+---------+---------+----------------------------------------------------+

I am using the same triton image

http://nvcr.io/nvidia/tritonserver:24.06-py3-igpu

Here is the output from nvidia-smi and nvcc -V

root@4fedf3f:/app/python_backend# nvidia-smi
Tue Sep 10 23:53:40 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0                Driver Version: N/A          CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

root@4fedf3f:/app/python_backend# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:08:11_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Thanks again for looking into this!

AastaLLL · September 11, 2024, 3:37am

Hi,

Unfortunately, this is not supported.

Triton is unable to enable the GPU models for the Python backend because the Python backend communicates with the GPU using non-supported IPC CUDA Driver API.

We will update this information in the document accordingly.

Thanks.

user30127 · September 11, 2024, 7:36pm

Thanks for the response, @AastaLLL! Could you clarify exactly what combination is not supported? Is it the combination of (python backend) x (GPU) that is not supported? Or only (python backend) x (GPU) x (Jetson-ARM device)? Or is it just tritonserver:24.06-py3-igpu?

AastaLLL · September 16, 2024, 8:45am

Hi,

The issue is related to the Jetson environment.

The Triton is unable to enable the GPU models for the python backend because the python backend communicates with the GPU using a legacy IPC CUDA Driver API, which is not working on Jetson.

Thanks.

mauricio18 · September 30, 2024, 12:53pm

Hi,

Any progress here?
Found myself with same error and similar setup.
My case is an ensemble model that runs on GPU and pass tensors to a python backend model.
So, opened an issue on triton server repo 2 days ago…

system · October 14, 2024, 12:53pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Does Python backend support KIND_GPU in tritonserver Jetson Orin Nano tensorrt , cuda , jetson-inference , python	6	859	June 29, 2023
Unable to run Triton example TensorRT inference-server-triton	1	748	May 31, 2024
Triton server configuration instance group DeepStream SDK	4	1520	March 30, 2022
Does Python backend in Triton Server for Jetson supports GPU? Jetson Orin NX inference-server-triton	4	1056	November 1, 2022
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1181	October 12, 2023
Triton infererence server example 'simple_grpc_infer_client.py' DeepStream SDK	11	4825	March 23, 2022
Mistral AI Models TensorRT cudnn	1	256	June 25, 2024
I can't run deepstream-lidar-inference-app on jetson nano. It will report an error! DeepStream SDK	11	362	September 28, 2023
Regarding when we execute triton server on jetson orin getting an error unable to load model DeepStream SDK cuda	19	514	July 30, 2024
Triton server for squad model on P100 with TensorRT 6.0 Triton Inference Server - archived	0	885	June 23, 2020

GPU support with Triton iGPU image and Python Backend

Related topics