We have a fisheye camera on a Jetson TX2 device. On this device we are running an object detection model with oriented bounding boxes. Our setup includes
When running inference using our Triton server, we observed satisfactory performance. However, there’s a noticeable difference in the results when comparing the Triton server to the Jetson TX2 NX. Specifically, differences in confidence scores. Due to these differences we sometimes also get faulty predictions in production.
We use nvarguscamerarc in GStreamer, combined with Nvidia DeepStream elements for inference. We capture our images in Planar YUV 4:2:0 (aka I420). I will refer to the YUV images as our “raw images”. These images are captured post the multistreamtiler. We can also save jpeg images. In this case, the jpeg images are saved using the jpegenc element, resulting in images that are saved in the YCbCr colorspace.
We have detected a difference in prediction confidence in two scenarios:
When inferring a raw image (I420 - Planar YUV 4:2:0) directly from the camera on our Jetson vs. converting that same raw image to JPEG and then inferring it on the Jetson.
Some differences when inferring JPEG images on Triton vs. those same JPEG images on our Jetson.
We have executed various experiments, including:
Comparisons between Triton and raw PyTorch outputs.
Tests across all supported ONNX opsets with our stack, examining different settings for constant folding and graph simplification.
Compiled certain problematic images into a video and fed this to DeepStream on our TX2.
it is hard to get the root cause only from the description. could you share more details about the two tests? are you using the same model in the two tests?
about this “the Triton server” test, did you use DeepStream? which DeepStream sample are you testing? what is the media pipeline? about this “the Jetson TX2 NX”？ did you us DeepStream? what is the triton backend?
For all the tests that we are conducting, we are making sure that we are using the exact same model. We have conducted our tests at each step/format that the model can be tested in. PyTorch, ONNX and on triton. Experiments with ONNX and triton are thus models that are obtained from our original PyTorch model.
For the triton server test we have both tested with and without deepstream.
We are actually not using any of the reference applications, we have our own application that we run. Hence, the media pipeline that I refer to is also a custom pipeline.
The Jetson TX2 NX does use DeepStream. We have experimented with the TensorRT back-end.
My apologies about this, Our triton inference server does not have a deepstream component in it. This is only on our Jetson TX2 that we use Deepstream with nvinfer.
The pipeline on the inference server is simply like this:
json (with byte encoded jpeg) -> Flask Rest server -> .npy array as output -> triton inference server -> .npy array as response -> json
For the TX2 pipeline, you can find the simplified pipeline for saving the images from nvargus and jpeg below:
Thanks for the sharing. could you elaborate on this “with deepstream” test? are you using nvinfersrver + trtiton server? which is called nvinferserver grpc mode( preprocess will be done by nvinferserver, inference will be done by remote tritonserver. please refer to to “/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app-triton-grpc”).
what is the inference backend of tritonserver? TensorRT back-end or onnxruntime_onnx? here is a onnxruntime_onnx sample, \opt\nvidia\deepstream\deepstream-6.3\samples\triton_model_repo\densenet_onnx\config.pbtxt
about “Discrepancies”, which is better? triton server or nvinfer? AYK, if the inference results are different, we need to check preprocess and inference. deepstream-test1 can support nvinferserver and nvinfer inference. we suggest using this sample to reproduce this issue.
We do not use the nvinferserver. For the deepstream test we use the native deepstream that is installed on jetson devices using the SDK manager. The configuration that you’re referring to is slightly different if I understand it well, namely doing inference using the triton rest server?
The inference back-end that we use on the triton server is TensorRT.
We’ve discovered that wherever we run the code with the TensorRT back-end (whether this is via the Python SDK, Triton on our custom engine) we get similar results.
We’ve narrowed the problem down to the following 2 situations:
Difference in confidences (although minor) when running directly in PyTorch (or ONNX) vs. running with TensorRT
Difference in confidences (biggest impact) when running a “raw” image (YUV) through our gstreamer pipeline vs. infering from jpeg in pipeline
Noticing the one test is using nvinfer on Jetson TX2, the other test is using triton server. I am not clear about this quoted words “For the triton server test we have both tested with and without deepstream.”, how did you test with deepstream for the triton server test?
nvinfer is calling trt interface directly to do inference. trtion is another inference framework which packages many inference backend, for example, tensorrt and onnxruntime_onnx. Deepstream leverages nvinferserver plugin to send tensor to trtion, and let trtion do inerference.
yes, triton supports native and remote deployment. if using remote deployment, tritonserver is used to receive tensor and do inference.