Cannot Create Mobilenet SSD TRT Engine on Jetson Nano | [ERROR] UffParser: Unsupported number of graph 0

I am trying to convert an exported .etlt file into a TensorRT engine on the Jetson Nano. Here are the steps I am using up to this point:

I am using the tlt-streamanalytics:v2.0_dp_py2 Docker image for the training.

First I convert the object detection dataset into TFrecords:

tlt-dataset-convert -d $MAIN_DIR/specs/convert.spec -o $MAIN_DIR/dataset/tfrecords/converted.tfrecord

with this convert.spec file:

kitti_config {
  root_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"
  image_dir_name: "images"
  label_dir_name: "labels"
  image_extension: ".png"
  partition_mode: "random"
  num_partitions: 2
  val_split: 20
  num_shards: 10
}
image_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"

The [REPLACE_WITH_MAIN_DIR] is replaced with my actual main directory in my spec file.

Next, I download the pretrained model:

ngc registry model download-version nvidia/tlt_pretrained_object_detection:mobilenet_v1 --dest $MAIN_DIR/model/pretrained

Now I train the model:

tlt-train ssd -e $MAIN_DIR/specs/train.spec -r $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1 --gpus 1 -k $NGC_API_KEY

with this train.spec file:

training_config {
  batch_size_per_gpu: 32
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-6
      max_learning_rate: 5e-4
      soft_start: 0.1
      annealing: 0.7
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  matching_iou_threshold: 0.5
  batch_size: 16
}
nms_config {
  confidence_threshold: 0.01
  top_k: 200
}
augmentation_config {
  preprocessing {
    output_image_width: 224
    output_image_height: 224
    output_image_channel: 3
    min_bbox_width: 1.0
    min_bbox_height: 1.0
  }
  spatial_augmentation {

    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    color_shift_stddev: 0.0
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}
dataset_config {
  data_sources: {
    tfrecords_path: "[REPLACE_WITH_MAIN_DIR]/dataset/tfrecords/*"
    image_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"
  }
  image_extension: "png"
  target_class_mapping {
      key: "person"
      value: "person"
  }
  validation_fold: 0
}
ssd_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 0.33]"
  aspect_ratios: "[[1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0, 2.0, 0.5, 3.0, 0.33]]"
  two_boxes_for_ar1: true
  clip_boxes: false
  scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
  loss_loc_weight: 1.0
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "mobilenet_v1"
  freeze_bn: false
}

The [REPLACE_WITH_MAIN_DIR] is replaced with my actual main directory in my spec file.

Next, I prune the model:

tlt-prune -m $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120.tlt -k $NGC_API_KEY -o $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120_pruned.tlt

after pruning, I copy the mobilenet_v1.hdf5 to $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1.

Then, I re-train the model:

tlt-train ssd -e $MAIN_DIR/specs/retrain.spec -r $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1 --gpus 1 -k $NGC_API_KEY -m $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120_pruned.tlt

with this retrain.spec file:

training_config {
  batch_size_per_gpu: 32
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-6
      max_learning_rate: 5e-4
      soft_start: 0.1
      annealing: 0.7
    }
  }
  regularizer {
    type: NO_REG
    weight: 3e-9
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  matching_iou_threshold: 0.5
  batch_size: 16
}
nms_config {
  confidence_threshold: 0.01
  top_k: 200
}
augmentation_config {
  preprocessing {
    output_image_width: 224
    output_image_height: 224
    output_image_channel: 3
    min_bbox_width: 1.0
    min_bbox_height: 1.0
  }
  spatial_augmentation {

    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    color_shift_stddev: 0.0
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}
dataset_config {
  data_sources: {
    tfrecords_path: "[REPLACE_WITH_MAIN_DIR]/dataset/tfrecords/*"
    image_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"
  }
  image_extension: "png"
  target_class_mapping {
      key: "person"
      value: "person"
  }
  validation_fold: 0
}
ssd_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 0.33]"
  aspect_ratios: "[[1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0, 2.0, 0.5, 3.0, 0.33]]"
  two_boxes_for_ar1: true
  clip_boxes: false
  scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
  loss_loc_weight: 1.0
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "mobilenet_v1"
  freeze_bn: false
}

The [REPLACE_WITH_MAIN_DIR] is replaced with my actual main directory in my spec file.

After retraining, I create an INT8 calibration tensorflile:

tlt-int8-tensorfile ssd -e $MAIN_DIR/specs/retrain.spec -o $MAIN_DIR/model/export/calibration.tensor -m 20

followed by exporting the TLT model:

tlt-export ssd -m $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120.tlt \
    -k $NGC_API_KEY \
    --data_type int8 \
    --cal_data_file $MAIN_DIR/model/export/calibration.tensor \
    --cal_cache_file $MAIN_DIR/model/export/calibration.bin \
    -e $MAIN_DIR/specs/retrain.spec \
    -o $MAIN_DIR/model/export/ssd_mobilenet_v1.etlt

All of the above steps seem to work perfectly.

The next step, and the one that is failing, is to build the TensorRT engine on the Jetson Nano.

To do that, I start with Jetpack 4.4, and run the following commands to set up TensorRT with OSS:

# Update cmake to >=3.13
cd /tmp
apt remove --purge --auto-remove cmake
wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz
tar xvf cmake-3.13.5.tar.gz
cd cmake-3.13.5/
./configure
make -j$(nproc)
make install
ln -s /usr/local/bin/cmake /usr/bin/cmake

# Build TensorRT with OSS
cd /tmp
git clone -b release/7.1 https://github.com/nvidia/TensorRT
cd TensorRT/
git submodule update --init --recursive
export TRT_SOURCE=`pwd`
cd $TRT_SOURCE
mkdir -p build && cd build
/usr/local/bin/cmake .. -DGPU_ARCHS=53 -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc -DTRT_BIN_DIR=`pwd`/out
make nvinfer_plugin -j$(nproc)

# Replace libnvinfer
mv /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.1.3 ${HOME}/libnvinfer_plugin.so.7.1.3.bak
cp `pwd`/libnvinfer_plugin.so.7.1.3  /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.1.3
ldconfig

Then I install tlt-converter:

apt-get install -y libssl-dev
cd /tmp
wget https://developer.nvidia.com/tlt-converter-trt71 -O tlt-converter-trt.zip
unzip tlt-converter-trt.zip -d ./tlt-converter-trt
cp ./tlt-converter-trt/tlt-converter /usr/local/bin/tlt-converter
chmod +x /usr/local/bin/tlt-converter

Finally, I try to convert to a TRT engine:

export TRT_LIB_PATH="/usr/lib/aarch64-linux-gnu"
export TRT_INC_PATH="/usr/include/aarch64-linux-gnu"
tlt-converter -k $NGC_API_KEY \
    -o NMS \
    -e $MAIN_DIR/model/export/TRT_ssd_mobilenet_v1.bin \
    -d 3,224,224 \
    -t int8 \
    -c $MAIN_DIR/model/export/calibration.bin \
    -m 1 \
    $MAIN_DIR/model/export/ssd_mobilenet_v1.etlt

This segfaults with the following message:

[ERROR] UffParser: Unsupported number of graph 0
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine

I have verified the $NGC_API_KEY variable is the same across commands, and that the files do exist and aren’t empty (size larger than 0 bytes).

How can I get tlt-converter working so that it outputs a Jetson Nano TRT engine from my exported TLT model? Thank you in advance!

When you run tlt-train, the command line is missing “-m”. So, the unpruned tlt model is trained without pretrained model. This is ok.

tlt-train ssd -e $MAIN_DIR/specs/train.spec -r $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1 --gpus 1 -k $NGC_API_KEY

But after you get the tlt model and prune it, I see in your command as below.
What is the $OUTPUT_FILE ?
And also why did you copy the mobilenet_v1.hdf5 to $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1?

tlt-train ssd -e $MAIN_DIR/specs/retrain.spec -r $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1 --gpus 1 -k $NGC_API_KEY -m $OUTPUT_FILE

Just to verify that I understand your first comment, on the first tlt-train, I do not have to include the -m argument because I am training on a freshly downloaded NCG model, right?

I am copying the mobilenet_v1.hdf5to the pruned folder so that the training history for the unpruned model isn’t overwritten, so I can go back and examine it or use a different epoch to prune if needed. I train the unpruned in the “pretrained” folder, then retrain in the “pruned” folder.

As for the $OUTPUT_FILE variable, apologies, I forgot to replace it when copying the command from my script. I have edited the post and replaced the

$OUTPUT_FILE

with

$MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120_pruned.tlt

For clarity’s sake, here is the folder structure I am using for my project. I have written my scripts around this:

├───dataset
│   ├───images
│   ├───labels
│   └───tfrecords
├───inference
│   ├───labeled
│   └───original
├───model
│   ├───export
│   ├───pretrained
│   │   └───tlt_pretrained_object_detection_vmobilenet_v1
│   │       └───weights
│   └───pruned
│       └───tlt_pretrained_object_detection_vmobilenet_v1
│           └───weights
├───scripts
└───specs

Please let me know if you need any more information, and thank you again for the help.

EDIT:
Here is also the output from the tlt-export command:

Using TensorFlow backend.
2020-08-18 16:54:49,979 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /root/data/digital-turnstile/s3_bucket/test/specs/retrain.spec
2020-08-18 16:54:52,366 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /root/data/digital-turnstile/s3_bucket/test/specs/retrain.spec
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_5 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_4 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_3 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
[TensorRT] INFO: Detected 1 inputs and 2 output network tensors.
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] INFO: Starting Calibration with batch size 16.
DEPRECATED: This variant of get_batch is deprecated. Please use the single argument variant described in the documentation instead.
[TensorRT] INFO:   Calibrated batch 0 in 0.224351 seconds.
[TensorRT] INFO:   Calibrated batch 1 in 0.200988 seconds.
[TensorRT] INFO:   Calibrated batch 2 in 0.201656 seconds.
[TensorRT] INFO:   Calibrated batch 3 in 0.203554 seconds.
[TensorRT] INFO:   Calibrated batch 4 in 0.204118 seconds.
[TensorRT] INFO:   Calibrated batch 5 in 0.201365 seconds.
[TensorRT] INFO:   Calibrated batch 6 in 0.202454 seconds.
[TensorRT] INFO:   Calibrated batch 7 in 0.202228 seconds.
[TensorRT] INFO:   Calibrated batch 8 in 0.20287 seconds.
[TensorRT] INFO:   Calibrated batch 9 in 0.202521 seconds.
[TensorRT] WARNING: Tensor NMS_1 is uniformly zero; network calibration failed.
[TensorRT] INFO:   Post Processing Calibration data in 6.45679 seconds.
[TensorRT] INFO: Calibration completed in 33.0692 seconds.
[TensorRT] INFO: Writing Calibration Cache for calibrator: TRT-7000-EntropyCalibration2
2020-08-18 16:55:34,737 [INFO] modulus.export._tensorrt: Saving calibration cache (size 7455) to /root/data/digital-turnstile/s3_bucket/test/model/export/calibration.bin

[TensorRT] INFO: Detected 1 inputs and 2 output network tensors.

In SSD, if run tlt-train without “-m”, that means there are no pre-trained model during training. It is not a must to set “-m”. But if no “-m”, your training will not use ngc pre-trained model.

The mobilenet_v1.hdf5 is the ngc pre-trained model. Normally user can set it after “-m”.
After training, there are tlt format models in your result folder. User can run tlt-prune against the tlt models instead of the hdf5 file.

From your latest result, you have already generate the trt engine successfully.

Ah, that makes sense. I was under the impression that the -m argument only accepted .tlt files. I have re-trained with

tlt-train ssd -e $MAIN_DIR/specs/train.spec \
    -r $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1 \
    --gpus 1 \
    -k $NGC_API_KEY \
    -m $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/mobilenet_v1.hdf5

and got better results during training. This was indeed a bug in my scripts, thank you very much for pointing it out!

However, this does not fix my initial problem.

I do all of the other steps as written in my original post. (prune, retrain, export, convert) I am still unable to convert the .etlt file into a TRT inference engine on my target platform (Nvidia Jetson Nano on Jetpack 4.4):

[ERROR] UffParser: Unsupported number of graph 0
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine
./s3_bucket/test/scripts/optimize.bash: line 31: 22600 Segmentation fault      (core dumped) tlt-converter -k $NGC_API_KEY -o NMS -e $MAIN_DIR/model/export/TRT_ssd_mobilenet_v1.bin -d 3,224,224 -t int8 
-c $MAIN_DIR/model/export/calibration.bin -m 1 $MAIN_DIR/model/export/ssd_mobilenet_v1.etlt

The final TRT_ssd_mobilenet_v1.bin file is not generated.

Is there anything else I can do to help track down the issue? Would you like me to upload the “template” folder structure/scripts I am using as a .zip file, or something like that?

Thank you again for your help, I look forward to your reply.

From your previous post as above, you have already generated the trt engine successfully.

Please check again and try. Especially make sure your API key is correct during training or exporting. You can refer to jupyter notebook inside the tlt docker.

I think you misunderstand. My team has NOT gotten a usable TRT inference engine, we only have an .etlt file. We need a file which tensorrt can load without TLT being installed. These files usually have a .engine or .bin file extension.

The tlt-export command creates the .etlt file, and tlt-convert is what is supposed to take that file and create the TRT inference engine.

It is the tlt-export command had the message

[TensorRT] INFO: Detected 1 inputs and 2 output network tensors.

This is not the output of the tlt-convert step! This is from the tlt-export step.
The tlt-export step does not create an TRT inference engine file, it creates an .etlt file.

Running tlt-export on the training server is not the issue. The issue is that I cannot run tlt-convert on the Jetson Nano in order to optimize for that hardware and also get a generic TRT inference engine (usable without TLT installed). I don’t know why the tlt-export step has TensorRT logging, but tlt-export does not do the aarch64 TensorRT optimizations or export a .bin/.engine file.

(Also: The tlt-export command is run on the x86_64 server, and so it would be impossible for TensorRT to optimize it for the Jetson Nano even if tlt-export did make the TRT engine file. The TRT optimizations need to be done on the target platform using tlt-convert.)

Our error is with this: https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#gen_eng_tlt_converter
Not with this: https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#exporting_models

I have checked five times, and I am certain the $NGC_API_KEY variable is exactly the same for every single command.

As it currently stands, my team is unable to use TLT because we cannot deploy the trained model to the Jetson Nano with TensorRT, as we only have an .etlt file which will not work without having TLT installed. We will not have TLT installed on the final systems we deploy, only vanilla TensorRT, and that cannot use .etlt files.

Thank you again for your reply, I really do appreciate the help. But this issue is not solved and we are still unable to use TLT. Please let me know what else we can try, or how else we can help find a solution.

Oh sorry, I make a mistake and misunderstand. I will dig it out further.

Please run the SSD jupyter sample inside the docker.
From the feedback previously, customer did not meet similar issue in that sample with tlt-streamanalytics:v2.0_dp_py2.