Triton Server Crashing Running Centerpoint Keypoint (hourglass_512x512_kpts) on Jetson via Dockerized Triton

fifofonix · January 5, 2022, 6:23pm

The Triton server, running in a container, is crashing when running the Centernet Object & KeyPoints Model via grpc as a TF2 tensorflow_savedmodel on the Jetson TX2 and I’m looking for pointers as to how to proceed.

I’m new to Triton/Jetson but after some effort have got an environment that successfully runs Triton examples, namely inception_graphdef and densenet_onnx, via the example grpc_image_client.py and image_client.py python programs.

I have deployed the TF 2.0 hourglass_512x512_kpts model using the default generated pbconfig.txt file (see below) with the addition of a max batch size of 0. The Triton server loads the model and indicates it is ready to serve.

I have made minor changes to the grpc_image_client.py program to adapt to the different dimensionality of the hourglass_512x512_kpts model and invoked it with verbose logging enabled. The request is received, including the desire to only return one of the outputs (I’m trying to build incrementally), but aborts terminating the container (see select Triton logs below).

Any pointers as to how to debug this further?

Environment

cat /etc/nv_tegra_release
# R32 (release), REVISION: 5.2, GCID: 27767740, BOARD: t186ref, EABI: aarch64, DATE: Fri Jul  9 16:02:11 UTC 2021

Triton Logs

...
+----------------------------------+--------------------------------------------------------------------------------------+
| Option                           | Value                                                                                |
+----------------------------------+--------------------------------------------------------------------------------------+
| server_id                        | triton                                                                               |
| server_version                   | 2.11.0                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedul |
|                                  | e_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_d |
|                                  | ata statistics                                                                       |
| model_repository_path[0]         | /models                                                                              |
| model_control_mode               | MODE_NONE                                                                            |
| strict_model_config              | 0                                                                                    |
| pinned_memory_pool_byte_size     | 268435456                                                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                             |
| min_supported_compute_capability | 5.3                                                                                  |
| strict_readiness                 | 1                                                                                    |
| exit_timeout                     | 30                                                                                   |
+----------------------------------+--------------------------------------------------------------------------------------+
...
I0105 16:43:36.028928 1 grpc_server.cc:3151] Process for ModelInferHandler, rpc_ok=1, 1 step START
I0105 16:43:36.029217 1 grpc_server.cc:3144] New request handler for ModelInferHandler, 4
I0105 16:43:36.029283 1 model_repository_manager.cc:638] GetInferenceBackend() 'hourglass_512x512_kpts' version 1
I0105 16:43:36.029361 1 model_repository_manager.cc:638] GetInferenceBackend() 'hourglass_512x512_kpts' version 1
I0105 16:43:36.029530 1 infer_request.cc:524] prepared: [0x0x7e38093e80] request id: my request id, model: hourglass_512x512_kpts, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7e38094108] input: input_tensor, type: UINT8, original shape: [1,360,640,3], batch + shape: [1,360,640,3], shape: [1,360,640,3]
override inputs:
inputs:
[0x0x7e38094108] input: input_tensor, type: UINT8, original shape: [1,360,640,3], batch + shape: [1,360,640,3], shape: [1,360,640,3]
original requested outputs:
detection_classes
requested outputs:
detection_classes

I0105 16:43:36.029949 1 tensorflow.cc:2390] model hourglass_512x512_kpts, instance hourglass_512x512_kpts, executing 1 requests
I0105 16:43:36.030034 1 tensorflow.cc:1566] TRITONBACKEND_ModelExecute: Running hourglass_512x512_kpts with 1 requests
I0105 16:43:36.031564 1 tensorflow.cc:1816] TRITONBACKEND_ModelExecute: input 'input_tensor' is GPU tensor: false
2022-01-05 16:43:56.007132: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2022-01-05 16:44:01.398847: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10

The container terminates at this point with no further logging.

pbconfig.txt

name: "hourglass_512x512_kpts"
platform: "tensorflow_savedmodel"
max_batch_size: 0
version_policy {
  latest {
    num_versions: 1
  }
}
input {
  name: "input_tensor"
  data_type: TYPE_UINT8
  dims: 1
  dims: -1
  dims: -1
  dims: 3
}
output {
  name: "detection_boxes"
  data_type: TYPE_FP32
  dims: 1
  dims: 100
  dims: 4
}
output {
  name: "num_detections"
  data_type: TYPE_FP32
  dims: 1
}
output {
  name: "detection_keypoints"
  data_type: TYPE_FP32
  dims: 1
  dims: 100
  dims: 17
  dims: 2
}
output {
  name: "detection_classes"
  data_type: TYPE_FP32
  dims: 1
  dims: 100
}
output {
  name: "detection_keypoint_scores"
  data_type: TYPE_FP32
  dims: 1
  dims: 100
  dims: 17
}
output {
  name: "detection_scores"
  data_type: TYPE_FP32
  dims: 1
  dims: 100
}
instance_group {
  name: "hourglass_512x512_kpts"
  count: 1
  gpus: 0
  kind: KIND_GPU
}
default_model_filename: "model.savedmodel"
optimization {
  input_pinned_memory {
    enable: true
  }
  output_pinned_memory {
    enable: true
  }
}
backend: "tensorflow"

Dockerfile Building Triton Server Image

FROM nvcr.io/nvidia/l4t-ml:r32.5.0-py3

ENV TRITON_SERVER_VERSION=2.11.0
ENV JETPACK_VERSION=4.5

ARG DEBIAN_FRONTEND=noninteractive

WORKDIR /tritonserver

RUN wget https://github.com/triton-inference-server/server/releases/download/v${TRITON_SERVER_VERSION}/tritonserver${TRITON_SERVER_VERSION}-jetpack${JETPACK_VERSION}.tgz && \
    tar -xzf tritonserver${TRITON_SERVER_VERSION}-jetpack${JETPACK_VERSION}.tgz && \
    rm tritonserver${TRITON_SERVER_VERSION}-jetpack${JETPACK_VERSION}.tgz

RUN apt-get update -y

RUN apt-get install -y --no-install-recommends \
        software-properties-common \
        autoconf \
        automake \
        build-essential \
        cmake \
        git \
        libb64-dev \
        libre2-dev \
        libssl-dev \
        libtool \
        libboost-dev \
        libcurl4-openssl-dev \
        rapidjson-dev \
        patchelf \
        zlib1g-dev && \
    rm -rf /var/lib/apt/lists/*

RUN ln -s /tritonserver /opt/tritonserver 

ENV LD_LIBRARY_PATH=/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH

ENTRYPOINT ["/tritonserver/bin/tritonserver"]
CMD ["--help"]

Further Observations

I have invoked the triton model server docker image with a modified bash entrypoint and then tried to run a number of the tests under ./test-util/bin that are included. Most pass, and some fail for reasons that are obvious (lack of multiple GPUs), but there are several that fail, e.g. [ FAILED ] AllocatedMemoryTest.AllocFallback (1 ms). I am not clear, despite working test programs, as to whether these test failures point to some issue with my deployed environment or are to be expected on the jetson?

fifofonix · January 5, 2022, 9:01pm

Same issue persists despite now properly requesting the TF2 backend explicitly --backend-config=tensorflow,version=2 per Trtsever crashes !! · Issue #1299 · triton-inference-server/server · GitHub

The savedmodel I am using freshly downloaded from TensorFlowHub successfully runs on TensorFlow model server 2.5.1 incidentally. However, I have loaded and re-saved the model in a tensorflow 2.4 (version in the Triton server version I am using) container. This hopefully would ditch anything that tensorflow 2.4 does not understand.

Still looking for bright ideas as to what to try next.

fifofonix · January 5, 2022, 10:09pm

Seems to be a model-specific issue. Downloaded another TF2 object detection w/keypoints model with the same input/output specifications from TensorflowHub (Retinanet) and this has run fine with no issues. I am still eager to know how to debug further on the CenterNet model as it is our primary production model used right now.

AastaLLL · January 6, 2022, 3:29am

Hi,

Would you mind monitoring the device status at the same time?
TensorFlow may fail to allocate enough memory for the model and lead to this error.

$ sudo tegrastats

Thanks.

fifofonix · January 6, 2022, 8:49pm

I can see that memory usage for these object detection models is close to the capacity of the TX2. With the working Retinanet model I have been seeing usage of 6.5G/8.0G, and indeed with a second non-TF2 model loaded in Triton (pushing memory higher even if not invoked) I then saw similar inference crashes on the TF2 model, i.e. with no error messages.

When such crashes do occur I do see freeing of the used memory occur with usage on the unit dropping to 1.3GB. Regardless of this I have then occasionally seen issues on inferences post a server restart as if some resource I’m unaware of is being retained. Are there recommended actions to take post a Triton crash like this to free resources?

My understanding is that the Jetson TX2 memory is shared by the CPU and GPU. Up until now I’ve been leaving memory settings to default to themselves - which means 256BM pinned memory, and 64MB GPU memory.

This post touches on even tighter constraints with Jetson Nano although it doesn’t provide any specific recommendations in terms of Triton server settings to go with viz-a-vis limits:

I’ve not experimented with converting the large object detection models I’m running here which are UINT8 input matrices to TensorRT because a) I’m not sure the conversion is supported with this type, b) one of the main gains seemed to be in going to UINT8 which we have already…

AastaLLL · January 13, 2022, 8:40am

Hi,

Unfortunately, the INT8 operation cannot run on TX2 due to hardware limitations.
Ideally, Triton will use all the available memory on Jetson but this still depends on the backend you used.

Have you checked if the memory required of the model can be reduced?
For example, use lower batch size or input dimension?

Thanks.

system · February 9, 2022, 2:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GRPC Data Corruption/Issue with Yolo Object Detection with Triton on Jetson DeepStream SDK	20	681	June 25, 2024
Installing Triton Server on Lenovo SE70 with Xavier NX Jetson Xavier NX inference-server-triton	20	1006	April 22, 2024
Jetson Nano Out of Memory running TRT Model Jetson Nano tensorrt , tensorflow , inference-server-triton , deepstream	5	2179	December 22, 2021
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1248	October 12, 2023
Regarding when we execute triton server on jetson orin getting an error unable to load model DeepStream SDK cuda	19	801	July 30, 2024
Triton Inference Engine Tensorflow Model Configuration expects 2 inputs, model provides 1 DeepStream SDK inference-server-triton , inception	9	3715	September 19, 2022
Triton inference server is sending back "HTTP/1.1 400 Bad Request" TAO Toolkit	6	3430	October 12, 2021
Tao-converted .plan model running in triton-server turned to bad accurate TAO Toolkit	46	3554	April 1, 2022
I can't run deepstream-lidar-inference-app on jetson nano. It will report an error! DeepStream SDK	11	365	September 28, 2023
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	2762	September 2, 2022

Triton Server Crashing Running Centerpoint Keypoint (hourglass_512x512_kpts) on Jetson via Dockerized Triton

Related topics