RetinaNet on Jetson Nano

I trained a RetinaNet model using TLT and in the laptop I am able to generate the .engine and run it inside the triton container, I am now trying to move it into a Jetson Nano flashed using SDK Manager but I am facing this issue when running it:

[ERROR] UffParser: Validator error: FirstDimTile_4: Unsupported operation _BatchTilePlugin_TRT

In the Jetson I am running it using the docker container: deepstream-l4t:5.1-21.02-iot
On my host machine I have Ubuntu 18.04, DeepStream 5.1, TLT v3 with CUDA 11.1
On the Jetson I have Jetpack 4.5.1 with DeepStream 5.1 and CUDA 10.2

How can I convert the .etlt model into a .engine within the Jetson Nano?
Thanks in advance.

Hi,

Please check the document below:

https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/quickstart/deepstream_integration.html#convert-to-trt-engine

Thanks.

I found the solution, here are the proper steps to get RetinaNet working on a Jetson Nano, basically, you have to get TensorRT installed properly, JetPack does not have this properly installed or at all, and to do that you need CMAKE v3.13, then CMAKE and TensorRT system variables needs to be properly configured for the tlt-converter to work.

Instructions for Jetson

This are the instructions for running this project within a Jetson device.

System Requirements

  • Jetson Nano 4GB
  • 128 GB SD Card
  • Jetpack 4.5.1
    • DeepStream 5.1 used via Docker image (No need to install it)

Prepare the Jetson

  1. Update and upgrade all packages
apt update
apt upgrade
  1. Check cmake version
cmake --version
  1. If version is lower than 3.13, then install curl and remove current cmake version, otherwise proceed to step 4 and skip step 5
apt install unzip curl libssl-dev libcurl4-openssl-dev
apt remove cmake
  1. Remove any unused packages
apt autoremove
  1. Install cmake 3.13

Installing cmake could take time (30 min - 1 hour)

cd /opt/nvidia
wget http://www.cmake.org/files/v3.13/cmake-3.13.0.tar.gz
tar xpvf cmake-3.13.0.tar.gz cmake-3.13.0/
rm cmake-3.13.0.tar.gz
cd cmake-3.13.0/
./bootstrap --system-curl
make -j4
echo 'export PATH=/opt/nvidia/cmake-3.13.0/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

Install TensorRT

TensorRT must be installed in order to properly convert certain networks like RetinaNet
This could take about 30 minutes to 1 hour

cd /opt/nvidia/
git clone -b release/7.1 https://github.com/NVIDIA/TensorRT.git ./tensorrt
cd tensorrt
git submodule update --init --recursivesudo 
mkdir -p build && cd build
cmake .. -DGPU_ARCHS="53" -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_CUDA_COMPILER:PATH=/usr/local/cuda/bin/nvcc
make -j$(nproc)
echo 'export LD_LIBRARY_PATH=/opt/nvidia/tensorrt/build:$LD_LIBRARY_PATH' >> ~/.bashrc
echo 'export PATH=$LD_LIBRARY_PATH:$PATH' >> ~/.bashrc
source ~/.bashrc

Instal TLT Converter

This step is optional, as you could let the DeepStream run the converter for you, but if you wish to install it, then follow this steps

cd /opt/nvidia/
wget https://developer.nvidia.com/cuda102-trt71-jp45
unzip cuda102-trt71-jp45
mv cuda10.2_trt7.1_jp4.5/tlt-converter ./tlt-converter
chmod u+x tlt-converter
rm cuda102-trt71-jp45
rm -rdf cuda10.2_trt7.1_jp4.5

Convert the model

Execute the following command to convert the model from .etlt to fp32
Update parameters as required specially if you wish to convert into fp16 or int8

Note: this command will not work inside the container, if you wish to do it inside the container let the DeepStream run it for you instead.

/opt/nvidia/tlt-converter \
               -k <Your NGC API KEY>  \
               -d 3,<Model Input Image Height>,<Model Input Image Width> \
               -o NMS \
               -e ~/models/trt.fp32.engine \
               -t fp32 \
               -i nchw \
               -m 8 \
               ~/models/retinanet_resnet18.etlt

Run DeepStream container

You could use deepstream-l4t base or iot, in my case I use iot image and it will be downloaded if you don’t have it yet.

Update parameters as required, for example make sure the device/video parameter works for your environment and also the shared directory “~/” is pointing to the right place.

xhost +
sudo docker run -it --rm --net=host --runtime nvidia  -e DISPLAY=$DISPLAY \
    -v ~/:/opt/nvidia/deepstream/deepstream-5.1/mysharedpath \
    -v /opt/nvidia/rensorrt/build/:/opt/nvidia/deepstream/deepstream-5.1/tensorrt/ \
    --device /dev/video0:/dev/video0 \
    -w /opt/nvidia/deepstream/deepstream-5.1/mysharedpath \
    -v /tmp/.X11-unix/:/tmp/.X11-unix \
    nvcr.io/nvidia/deepstream-l4t:5.1-21.02-iot

Preparing to run DeepStream Application

Once inside the container, we need to setup the system to use our new version of TensorRT instead of the one that comes within the image, this command has to be executed every time the docker image is used.

export LD_LIBRARY_PATH=opt/nvidia/deepstream/deepstream-5.1/tensorrt/
export PATH=$LD_LIBRARY_PATH:$PATH

To run DeepStream use the following command

deepstream-test5-app \
          -c /opt/nvidia/deepstream/deepstream-5.1/mysharedpath/deepstream_app_config.txt

If DeepStream will generate the .engine file for you make sure you are using tlt-model-key and tlt-encoded-model, DeepStream will generate the .engine file in the same location as your tlt-encoded-model.

Good to know you solve it.
Thanks a lot to feedback this with us!

Hi there!
I have trained yolo_v4 in TLT 3.0 and now am trying to deploy it in deepstream 5.1 using a sample python app, i.e: deepstream-test1. Currently, I am facing issues with the config files. Would you be kind enough to assist me with that?
Thank you

Yolo3 and 4 require lot of effort to make it work as you need to build a custom plugin (in C++) to extract the information from the network which is very painful, for such I moved to something more out of the box like RetinaNet where the configuration on deepstream is very straight forward, Sorry I can’t help you on this.

Thanks for the response. Please do let me know if you find some configuration for the sample yolov4 TLT notebook.
Good day