I have a dual 3090 machine running Triton and serving a transformer model (Huggingface) using the python backend. It runs fine if my gRPC requests come in at a slow rate…maybe 1 every 10 seconds or so. However, if I send requests at a high rate (~10 Hz), the server will silently reboot. I don’t see anything in the log files that indicate anything wierd. Thinking it might be something specific to this server, I tried it on a server with dual 4090 gpus. Same behavior. I then tried it on a machine with a single A10. It worked flawlessly, and no matter how heavily I hit it with gRPC requests, it stayed up and running. This leads me to believe that my issue is related to configuration of Triton for multiple gpus.
Attached are my config.pbtxt for the model and the log file that was recorded during the crash.
config_pb.txt (362 Bytes)
log.txt (7.5 MB)