behaves differently when compiled with and without '-O3'


While attempting to do a custom Yocto build for the AGX Devkit we ran into a problem where we found that our trained F-RCNN model yielded less detections than when we ran on Nvidia’s standard Ubuntu distribution on the AGX Devkit. In our Yocto build we were following the steps as described in this repo:

for building a that contained the necessary cropAndResizePlugin and proposalPlugin plugins. We followed Option #2 for deepstream 4.0.2 & TRT 6.0. We found two problems with

  1. The is inaccurate regarding CMAKE_BUILD_TYPE. The top-level cmake files do not specify ‘Release’ as the default. Only by directly specifying it when invoking cmake or by enabling parsers to be built either implicitly (the default) or explicitly via -DBUILD_PARSERS=ON will ‘Release’ be set as the build type. The onnx parser will set the build type to ‘Release’ if has not already been defined. This is important because it is the ‘Release’ build type which setups up the default CXX_FLAGS to ‘-O3 -DNDEBUG’. Otherwise, these flags are omitted for debug builds. We stumbled into this because when enabling parsers and the samples in our yocto build the make install step fails. So, we disabled both and then discovered problem #2 below.

  2. For debug builds when ‘-O3’ is not specified our own model yields less detections. We ran a sample image through a gstreamer pipeline which yielded only 4 detections when not compiled with ‘-O3’ vs 13 detections when compiled with it.

To recreate run the faster rcnn sample from deepstream_4.x_apps with the plugin compiled both ways. It appears to also yield different numbers of detections just based from visual inspection.


JetPack 4.3 on Jetson Xavier AGX Devkit

Moving this to Jetson Xavier forum so that Jetson team can take a look.


Does this issue also occurs on our standard L4T environment or it must be reproduced with Yocto build?

With our model it can be reproduced in the standard L4T environment as well. And, then per the OP it appears that there is a difference in the faster rcnn sample from deepstream-custom in deepstream_4.x_apps.


We want to reproduce this on our environment.
Would you mind to share a simple source and the corresponding model for this issue so we can check it deeper?


Follow the instructions for building the Faster RCNN custom parser and libnvinfer_plugin from deepstream_4.x_apps. Then run the script below against libnvinfer_plugin compiled with and without ‘-O3’ and capture the output so you can compare. There is clearly a difference.

gst-launch-1.0 \
uridecodebin uri=file:///opt/nvidia/deepstream/deepstream-4.0/samples/streams/sample_720p.h264 ! m.sink_0 \
nvstreammux name=m batch-size=1 width=1280 height=720 ! \
nvinfer config-file-path=pgie_frcnn_uff_config.txt ! nvvideoconvert ! \
queue ! nvdsosd ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=NV12’ ! \
nvv4l2h264enc ! h264parse ! qtmux ! filesink location=output.mp4


Sorry to keep you waiting.
Since we announce a new release today, would you mind to check if this issue also occurs on the our Deepstream 5.0 package first?


Sorry for the slow response. It’s a non-issue there because we’re not required to build in DS 5.0.