Triton with python backend crashes when running on multi-gpu server

rbgreenway · December 22, 2023, 9:09pm

I have a dual 3090 machine running Triton and serving a transformer model (Huggingface) using the python backend. It runs fine if my gRPC requests come in at a slow rate…maybe 1 every 10 seconds or so. However, if I send requests at a high rate (~10 Hz), the server will silently reboot. I don’t see anything in the log files that indicate anything wierd. Thinking it might be something specific to this server, I tried it on a server with dual 4090 gpus. Same behavior. I then tried it on a machine with a single A10. It worked flawlessly, and no matter how heavily I hit it with gRPC requests, it stayed up and running. This leads me to believe that my issue is related to configuration of Triton for multiple gpus.

Attached are my config.pbtxt for the model and the log file that was recorded during the crash.

config_pb.txt (362 Bytes)
log.txt (7.5 MB)

Topic		Replies	Views
Triton service not responding Triton Inference Server - archived	0	548	April 16, 2021
Triton crashes with Signal (11) TensorRT	1	857	August 3, 2023
Triton-server model load balancing DeepStream SDK inference-server-triton	6	968	February 8, 2023
gRPC end point unavailable status on triton inference server TensorRT python , cudnn , inference-server-triton , grpc	1	43	December 31, 2024
TRTIS randomly quit without any error message Triton Inference Server - archived	0	558	May 12, 2020
Inference speed of Triton Server Triton Inference Server - archived tensorrt , python , inference-server-triton	0	642	December 19, 2023
Random spikes in RAM while using Triton Inference TensorRT tensorrt , cuda , ubuntu , inference-server-triton	1	463	August 3, 2023
GPU support with Triton iGPU image and Python Backend Jetson Orin Nano python	9	363	October 14, 2024
3090 GPU crashed - Unknown error on fan speed Linux	0	300	May 16, 2023
Does Python backend in Triton Server for Jetson supports GPU? Jetson Orin NX inference-server-triton	4	1067	November 1, 2022

Triton with python backend crashes when running on multi-gpu server

Related topics