The Triton server, running in a container, is crashing when running the Centernet Object & KeyPoints Model via grpc as a TF2 tensorflow_savedmodel
on the Jetson TX2 and I’m looking for pointers as to how to proceed.
I’m new to Triton/Jetson but after some effort have got an environment that successfully runs Triton examples, namely inception_graphdef and densenet_onnx, via the example grpc_image_client.py and image_client.py python programs.
I have deployed the TF 2.0 hourglass_512x512_kpts model using the default generated pbconfig.txt
file (see below) with the addition of a max batch size of 0. The Triton server loads the model and indicates it is ready to serve.
I have made minor changes to the grpc_image_client.py program to adapt to the different dimensionality of the hourglass_512x512_kpts model and invoked it with verbose logging enabled. The request is received, including the desire to only return one of the outputs (I’m trying to build incrementally), but aborts terminating the container (see select Triton logs below).
Any pointers as to how to debug this further?
Environment
cat /etc/nv_tegra_release
# R32 (release), REVISION: 5.2, GCID: 27767740, BOARD: t186ref, EABI: aarch64, DATE: Fri Jul 9 16:02:11 UTC 2021
Triton Logs
...
+----------------------------------+--------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.11.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedul |
| | e_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_d |
| | ata statistics |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 5.3 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+--------------------------------------------------------------------------------------+
...
I0105 16:43:36.028928 1 grpc_server.cc:3151] Process for ModelInferHandler, rpc_ok=1, 1 step START
I0105 16:43:36.029217 1 grpc_server.cc:3144] New request handler for ModelInferHandler, 4
I0105 16:43:36.029283 1 model_repository_manager.cc:638] GetInferenceBackend() 'hourglass_512x512_kpts' version 1
I0105 16:43:36.029361 1 model_repository_manager.cc:638] GetInferenceBackend() 'hourglass_512x512_kpts' version 1
I0105 16:43:36.029530 1 infer_request.cc:524] prepared: [0x0x7e38093e80] request id: my request id, model: hourglass_512x512_kpts, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7e38094108] input: input_tensor, type: UINT8, original shape: [1,360,640,3], batch + shape: [1,360,640,3], shape: [1,360,640,3]
override inputs:
inputs:
[0x0x7e38094108] input: input_tensor, type: UINT8, original shape: [1,360,640,3], batch + shape: [1,360,640,3], shape: [1,360,640,3]
original requested outputs:
detection_classes
requested outputs:
detection_classes
I0105 16:43:36.029949 1 tensorflow.cc:2390] model hourglass_512x512_kpts, instance hourglass_512x512_kpts, executing 1 requests
I0105 16:43:36.030034 1 tensorflow.cc:1566] TRITONBACKEND_ModelExecute: Running hourglass_512x512_kpts with 1 requests
I0105 16:43:36.031564 1 tensorflow.cc:1816] TRITONBACKEND_ModelExecute: input 'input_tensor' is GPU tensor: false
2022-01-05 16:43:56.007132: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2022-01-05 16:44:01.398847: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
The container terminates at this point with no further logging.
pbconfig.txt
name: "hourglass_512x512_kpts"
platform: "tensorflow_savedmodel"
max_batch_size: 0
version_policy {
latest {
num_versions: 1
}
}
input {
name: "input_tensor"
data_type: TYPE_UINT8
dims: 1
dims: -1
dims: -1
dims: 3
}
output {
name: "detection_boxes"
data_type: TYPE_FP32
dims: 1
dims: 100
dims: 4
}
output {
name: "num_detections"
data_type: TYPE_FP32
dims: 1
}
output {
name: "detection_keypoints"
data_type: TYPE_FP32
dims: 1
dims: 100
dims: 17
dims: 2
}
output {
name: "detection_classes"
data_type: TYPE_FP32
dims: 1
dims: 100
}
output {
name: "detection_keypoint_scores"
data_type: TYPE_FP32
dims: 1
dims: 100
dims: 17
}
output {
name: "detection_scores"
data_type: TYPE_FP32
dims: 1
dims: 100
}
instance_group {
name: "hourglass_512x512_kpts"
count: 1
gpus: 0
kind: KIND_GPU
}
default_model_filename: "model.savedmodel"
optimization {
input_pinned_memory {
enable: true
}
output_pinned_memory {
enable: true
}
}
backend: "tensorflow"
Dockerfile Building Triton Server Image
FROM nvcr.io/nvidia/l4t-ml:r32.5.0-py3
ENV TRITON_SERVER_VERSION=2.11.0
ENV JETPACK_VERSION=4.5
ARG DEBIAN_FRONTEND=noninteractive
WORKDIR /tritonserver
RUN wget https://github.com/triton-inference-server/server/releases/download/v${TRITON_SERVER_VERSION}/tritonserver${TRITON_SERVER_VERSION}-jetpack${JETPACK_VERSION}.tgz && \
tar -xzf tritonserver${TRITON_SERVER_VERSION}-jetpack${JETPACK_VERSION}.tgz && \
rm tritonserver${TRITON_SERVER_VERSION}-jetpack${JETPACK_VERSION}.tgz
RUN apt-get update -y
RUN apt-get install -y --no-install-recommends \
software-properties-common \
autoconf \
automake \
build-essential \
cmake \
git \
libb64-dev \
libre2-dev \
libssl-dev \
libtool \
libboost-dev \
libcurl4-openssl-dev \
rapidjson-dev \
patchelf \
zlib1g-dev && \
rm -rf /var/lib/apt/lists/*
RUN ln -s /tritonserver /opt/tritonserver
ENV LD_LIBRARY_PATH=/tritonserver/backends/tensorflow2:$LD_LIBRARY_PATH
ENTRYPOINT ["/tritonserver/bin/tritonserver"]
CMD ["--help"]
Further Observations
I have invoked the triton model server docker image with a modified bash
entrypoint and then tried to run a number of the tests under ./test-util/bin
that are included. Most pass, and some fail for reasons that are obvious (lack of multiple GPUs), but there are several that fail, e.g. [ FAILED ] AllocatedMemoryTest.AllocFallback (1 ms). I am not clear, despite working test programs, as to whether these test failures point to some issue with my deployed environment or are to be expected on the jetson?