Discrepancies in Inference Confidence between Triton Server and Jetson TX2 NX for object detection model with oriented bounding boxes

We have a fisheye camera on a Jetson TX2 device. On this device we are running an object detection model with oriented bounding boxes. Our setup includes

  • TensorRT v8.2.0.1
  • DeepStream 6.0
  • JetPack 4.6.4

When running inference using our Triton server, we observed satisfactory performance. However, there’s a noticeable difference in the results when comparing the Triton server to the Jetson TX2 NX. Specifically, differences in confidence scores. Due to these differences we sometimes also get faulty predictions in production.

We use nvarguscamerarc in GStreamer, combined with Nvidia DeepStream elements for inference. We capture our images in Planar YUV 4:2:0 (aka I420). I will refer to the YUV images as our “raw images”. These images are captured post the multistreamtiler. We can also save jpeg images. In this case, the jpeg images are saved using the jpegenc element, resulting in images that are saved in the YCbCr colorspace.

We have detected a difference in prediction confidence in two scenarios:

  1. When inferring a raw image (I420 - Planar YUV 4:2:0) directly from the camera on our Jetson vs. converting that same raw image to JPEG and then inferring it on the Jetson.
  2. Some differences when inferring JPEG images on Triton vs. those same JPEG images on our Jetson.

We have executed various experiments, including:

  1. Comparisons between Triton and raw PyTorch outputs.
  2. Tests across all supported ONNX opsets with our stack, examining different settings for constant folding and graph simplification.
  3. Compiled certain problematic images into a video and fed this to DeepStream on our TX2.
  1. it is hard to get the root cause only from the description. could you share more details about the two tests? are you using the same model in the two tests?
  2. about this “the Triton server” test, did you use DeepStream? which DeepStream sample are you testing? what is the media pipeline? about this “the Jetson TX2 NX”? did you us DeepStream? what is the triton backend?

Hello Fanzh,

  1. For all the tests that we are conducting, we are making sure that we are using the exact same model. We have conducted our tests at each step/format that the model can be tested in. PyTorch, ONNX and on triton. Experiments with ONNX and triton are thus models that are obtained from our original PyTorch model.

  • For the triton server test we have both tested with and without deepstream.
  • We are actually not using any of the reference applications, we have our own application that we run. Hence, the media pipeline that I refer to is also a custom pipeline.
  • The Jetson TX2 NX does use DeepStream. We have experimented with the TensorRT back-end.
  1. do you mean triton server test uses deepstream nvinferserver inference and Jetson TX2 NX uses deepstream nvinfer inference? the two test results are different? what are the media pipelines?
  2. there are many deepsgream nvinferserver samples, such as deepstream-test1. to narrow down this issue. could you use deepstream sample to reproduce this issue?

Hello Fanzh,

  1. My apologies about this, Our triton inference server does not have a deepstream component in it. This is only on our Jetson TX2 that we use Deepstream with nvinfer.
    The pipeline on the inference server is simply like this:
json (with byte encoded jpeg) -> Flask Rest server -> .npy array as output -> triton inference server -> .npy array as response -> json

For the TX2 pipeline, you can find the simplified pipeline for saving the images from nvargus and jpeg below:

nvarguscamerasrc ! "video/x-raw(memory:NVMM),format=NV12 ! nvvideoconvert src-crop=... ! "video/x-raw(memory:NVMM),format=NV12" ! nvstreammux ! nvinfer (yolo) ! nvinfer (alphapose) ! nvvideoconvert ! nvmultistreamtiler ! nvvideoconvert tee name=t
t. ! jpegenc ! multifilesink
t. queue !  multifilesink
  1. I will get back about this.
  1. Thanks for the sharing. could you elaborate on this “with deepstream” test? are you using nvinfersrver + trtiton server? which is called nvinferserver grpc mode( preprocess will be done by nvinferserver, inference will be done by remote tritonserver. please refer to to “/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app-triton-grpc”).
    what is the inference backend of tritonserver? TensorRT back-end or onnxruntime_onnx? here is a onnxruntime_onnx sample, \opt\nvidia\deepstream\deepstream-6.3\samples\triton_model_repo\densenet_onnx\config.pbtxt
  2. about “Discrepancies”, which is better? triton server or nvinfer? AYK, if the inference results are different, we need to check preprocess and inference. deepstream-test1 can support nvinferserver and nvinfer inference. we suggest using this sample to reproduce this issue.
  1. We do not use the nvinferserver. For the deepstream test we use the native deepstream that is installed on jetson devices using the SDK manager. The configuration that you’re referring to is slightly different if I understand it well, namely doing inference using the triton rest server?
    The inference back-end that we use on the triton server is TensorRT.
  2. We’ve discovered that wherever we run the code with the TensorRT back-end (whether this is via the Python SDK, Triton on our custom engine) we get similar results.

We’ve narrowed the problem down to the following 2 situations:

  • Difference in confidences (although minor) when running directly in PyTorch (or ONNX) vs. running with TensorRT
  • Difference in confidences (biggest impact) when running a “raw” image (YUV) through our gstreamer pipeline vs. infering from jpeg in pipeline

Noticing the one test is using nvinfer on Jetson TX2, the other test is using triton server. I am not clear about this quoted words “For the triton server test we have both tested with and without deepstream.”, how did you test with deepstream for the triton server test?

nvinfer is calling trt interface directly to do inference. trtion is another inference framework which packages many inference backend, for example, tensorrt and onnxruntime_onnx. Deepstream leverages nvinferserver plugin to send tensor to trtion, and let trtion do inerference.
yes, triton supports native and remote deployment. if using remote deployment, tritonserver is used to receive tensor and do inference.

could you highlight which question we need to check first? Thank you!

do we still need to check this problem? if yes, could you please provide more information.

  1. is the original model format PyTorch?
  2. about “directly in PyTorch”, do you mean using pytorch framework not triton?
  3. about “with TensorRT”, do you mean using deepstream nvinfer in TX2?
  1. where and how did you do the preprocessing and postprocessing? what language and libs are using to do these processing?
  2. is triton backend TensorRT?
  3. could you provide simplified code and configuration file in this trtion inference?

we have checked the file your shared, here are some questions:

  1. I compared the jpeg and raw direcotry in inference_examples, only 1695373105811-00050.jpeg is different, one in jpeg directory has bbox, the other one has no. could you highlight the “biggest impact”?
  2. in run_command_raw.sh, why is 1720 used in “width=1720,height=1720”?
  3. please make sure the preprocessing parameters in nvinfer configuration are consistent with model trainning’s preprocessing parameters. please refer to the parmameter explanation in this link.

Hello fanzh,

We had a call with Nvidia yesterday, we received some good pointers and experiments to conduct on our side. Will update when we have more information regarding this.

Thanks for the update!