Description
We noticed couple of times that triton is crashing with signal 11. RAM, CPU and GPU are not spiking before the crash.
Environment
Trition verison : 22.12 (nvcr.io/nvidia/tritonserver:22.12-py3)
Python Backend verison : r21.08
TensorRT Version : 8.5.1
GPU Type : GeForce RTX 2080 SUPER
Nvidia Driver Version : 510.108.03
CUDA Version : 11.8
CUDNN Version : 8.7.0 GA
Operating System + Version : Ubuntu 20.04
Python Version (if applicable) : 3.7
Relevant Files
RAM usage:
CPU:
GPU:
Steps To Reproduce
We couldn’t find the exact reason for the crash, so cannot specify the steps to reproduce. This signal is observed mostly on long runs (8-12 hrs) of Triton Server.