Issue with TensorRT 7.1.3 on Jetson AGX

Crossposting from TRT 7.1.3 - invalid results, but only on Jetpack

We’re noticing that SSD models (such as model mentioned here - Release Notes :: NVIDIA Deep Learning TensorRT Documentation ) generate incorrect output, but only when using TRT 7.1.3, and only on Jetson (e.g. the same model with the same code will generate the expected output with TRT 7.1.3 on all x86_64 platforms, as well as with TRT 6.0.1.5 on the same Jetson hardware)

This only seems to apply to SSD (we’ve seen the problem with the above publicly available model, as well as with four of our proprietary models); our Yolo model seems to work in this scenario.

Any help? At this point it feels like a Jetpack bug. With the model above, ‘fc6’ is the first layer which ends up spitting different result in TRT7, compared to TRT6. There’s nothing special about that layer, or that model.

Hi,

Do you use GPU for TensorRT or DLA?
To check this issue further, could you share the source code and model with us?

Thanks.

GPU. The model, as I’ve mentioned, is the one coming from https://drive.google.com/file/d/0BzKzrI_SkD1_WVVTSmQxU0dVRzA/view (see TRT 5.0.2 release notes link in the OP).

As for the code, I’ve tried to isolate and simplify it as much as possible (for example, removing async CUDA operation); none of which changes the outcome. I’m attaching the latest version.

The only thing I wasn’t able to do to simplify the things further is to statically link with TensorRT, as there are still symbols being unresolved in the final binary. I’ve opened a separate issue for that on the TRT forum.

sampleTRTLib.cpp (7.4 KB)

I’ve tried to put all the relevant pieces together, in a single repo: GitHub - w3sip/sampleTRT

I’ve switched from 6-to-7 by toggling these two pairs of lines:

set (TRTVER 6)
set (CUDAVER 100)
# set (TRTVER 7)
# set (CUDAVER 102)

The output folders are then uploaded to TX2 in their entirety and ran using the following script:

export LD_DEBUG=libs

export CONFIGNAME=bin7-dynamic
pushd $CONFIGNAME
export LD_LIBRARY_PATH=`pwd`
./sampleTRT > ../$CONFIGNAME.log
popd


export CONFIGNAME=bin6-dynamic
pushd $CONFIGNAME
export LD_LIBRARY_PATH=`pwd`
./sampleTRT > ../$CONFIGNAME.log
popd

I’m attaching the resulting log from each run. Again, notice the distinct difference in outputs – same model and input are used in both cases.

bin6-dynamic.log (158.8 KB)
bin7-dynamic.log (230.4 KB)

Thanks for your data.

We are working reproduce this issue internally.
Will get back to you if we have any progress.

The problem seems to be around engine serialization/deserialization.
If we build the CUDA engine, and use it right away, the problem isn’t occurring.

If we serialize it with ICudaEngine::serialize(), and then de-serialize with IRuntime::deserializeCudaEngine(ihm->data(), ihm->size(), nullptr);
the error above is seen.

Stock sample_ssd reproduces the issue as well if replacing

auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(mEngine->createExecutionContext());

with

IHostMemory* ihm = mEngine->serialize();
IRuntime* runtime = createInferRuntime(sample::gLogger.getTRTLogger());
ICudaEngine* engine = runtime->deserializeCudaEngine(ihm->data(), ihm->size(), nullptr);

auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(engine->createExecutionContext());

Any updates, guys? A workaround will do, though we do need to be able to cache the CUDA engine …

Could this somehow be a problem? Attempting to build TRT from source, so we can debug this properly – and seeing these warnings:

/src/.build/TensorRT/parsers/caffe/../common/parserUtils.h:77:13: warning: enumeration value 'kBOOL' not handled in switch [-Wswitch]
    switch (t)
            ^
/src/.build/TensorRT/parsers/caffe/../common/parserUtils.h:99:13: warning: enumeration value 'kBOOL' not handled in switch [-Wswitch]
    switch (dt)
            ^
2 warnings generated.