I am trying to convert an exported .etlt
file into a TensorRT engine on the Jetson Nano. Here are the steps I am using up to this point:
I am using the tlt-streamanalytics:v2.0_dp_py2
Docker image for the training.
First I convert the object detection dataset into TFrecords:
tlt-dataset-convert -d $MAIN_DIR/specs/convert.spec -o $MAIN_DIR/dataset/tfrecords/converted.tfrecord
with this convert.spec
file:
kitti_config {
root_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"
image_dir_name: "images"
label_dir_name: "labels"
image_extension: ".png"
partition_mode: "random"
num_partitions: 2
val_split: 20
num_shards: 10
}
image_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"
The [REPLACE_WITH_MAIN_DIR]
is replaced with my actual main directory in my spec file.
Next, I download the pretrained model:
ngc registry model download-version nvidia/tlt_pretrained_object_detection:mobilenet_v1 --dest $MAIN_DIR/model/pretrained
Now I train the model:
tlt-train ssd -e $MAIN_DIR/specs/train.spec -r $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1 --gpus 1 -k $NGC_API_KEY
with this train.spec
file:
training_config {
batch_size_per_gpu: 32
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
matching_iou_threshold: 0.5
batch_size: 16
}
nms_config {
confidence_threshold: 0.01
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 224
output_image_height: 224
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: "[REPLACE_WITH_MAIN_DIR]/dataset/tfrecords/*"
image_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"
}
image_extension: "png"
target_class_mapping {
key: "person"
value: "person"
}
validation_fold: 0
}
ssd_config {
aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 0.33]"
aspect_ratios: "[[1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0, 2.0, 0.5, 3.0, 0.33]]"
two_boxes_for_ar1: true
clip_boxes: false
scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
loss_loc_weight: 1.0
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "mobilenet_v1"
freeze_bn: false
}
The [REPLACE_WITH_MAIN_DIR]
is replaced with my actual main directory in my spec file.
Next, I prune the model:
tlt-prune -m $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120.tlt -k $NGC_API_KEY -o $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120_pruned.tlt
after pruning, I copy the mobilenet_v1.hdf5
to $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1
.
Then, I re-train the model:
tlt-train ssd -e $MAIN_DIR/specs/retrain.spec -r $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1 --gpus 1 -k $NGC_API_KEY -m $MAIN_DIR/model/pretrained/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120_pruned.tlt
with this retrain.spec
file:
training_config {
batch_size_per_gpu: 32
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: NO_REG
weight: 3e-9
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
matching_iou_threshold: 0.5
batch_size: 16
}
nms_config {
confidence_threshold: 0.01
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 224
output_image_height: 224
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: "[REPLACE_WITH_MAIN_DIR]/dataset/tfrecords/*"
image_directory_path: "[REPLACE_WITH_MAIN_DIR]/dataset"
}
image_extension: "png"
target_class_mapping {
key: "person"
value: "person"
}
validation_fold: 0
}
ssd_config {
aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 0.33]"
aspect_ratios: "[[1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0, 2.0, 0.5, 3.0, 0.33]]"
two_boxes_for_ar1: true
clip_boxes: false
scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
loss_loc_weight: 1.0
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "mobilenet_v1"
freeze_bn: false
}
The [REPLACE_WITH_MAIN_DIR]
is replaced with my actual main directory in my spec file.
After retraining, I create an INT8 calibration tensorflile:
tlt-int8-tensorfile ssd -e $MAIN_DIR/specs/retrain.spec -o $MAIN_DIR/model/export/calibration.tensor -m 20
followed by exporting the TLT model:
tlt-export ssd -m $MAIN_DIR/model/pruned/tlt_pretrained_object_detection_vmobilenet_v1/weights/ssd_mobilenet_v1_epoch_120.tlt \
-k $NGC_API_KEY \
--data_type int8 \
--cal_data_file $MAIN_DIR/model/export/calibration.tensor \
--cal_cache_file $MAIN_DIR/model/export/calibration.bin \
-e $MAIN_DIR/specs/retrain.spec \
-o $MAIN_DIR/model/export/ssd_mobilenet_v1.etlt
All of the above steps seem to work perfectly.
The next step, and the one that is failing, is to build the TensorRT engine on the Jetson Nano.
To do that, I start with Jetpack 4.4, and run the following commands to set up TensorRT with OSS:
# Update cmake to >=3.13
cd /tmp
apt remove --purge --auto-remove cmake
wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz
tar xvf cmake-3.13.5.tar.gz
cd cmake-3.13.5/
./configure
make -j$(nproc)
make install
ln -s /usr/local/bin/cmake /usr/bin/cmake
# Build TensorRT with OSS
cd /tmp
git clone -b release/7.1 https://github.com/nvidia/TensorRT
cd TensorRT/
git submodule update --init --recursive
export TRT_SOURCE=`pwd`
cd $TRT_SOURCE
mkdir -p build && cd build
/usr/local/bin/cmake .. -DGPU_ARCHS=53 -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc -DTRT_BIN_DIR=`pwd`/out
make nvinfer_plugin -j$(nproc)
# Replace libnvinfer
mv /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.1.3 ${HOME}/libnvinfer_plugin.so.7.1.3.bak
cp `pwd`/libnvinfer_plugin.so.7.1.3 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.1.3
ldconfig
Then I install tlt-converter
:
apt-get install -y libssl-dev
cd /tmp
wget https://developer.nvidia.com/tlt-converter-trt71 -O tlt-converter-trt.zip
unzip tlt-converter-trt.zip -d ./tlt-converter-trt
cp ./tlt-converter-trt/tlt-converter /usr/local/bin/tlt-converter
chmod +x /usr/local/bin/tlt-converter
Finally, I try to convert to a TRT engine:
export TRT_LIB_PATH="/usr/lib/aarch64-linux-gnu"
export TRT_INC_PATH="/usr/include/aarch64-linux-gnu"
tlt-converter -k $NGC_API_KEY \
-o NMS \
-e $MAIN_DIR/model/export/TRT_ssd_mobilenet_v1.bin \
-d 3,224,224 \
-t int8 \
-c $MAIN_DIR/model/export/calibration.bin \
-m 1 \
$MAIN_DIR/model/export/ssd_mobilenet_v1.etlt
This segfaults with the following message:
[ERROR] UffParser: Unsupported number of graph 0
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine
I have verified the $NGC_API_KEY
variable is the same across commands, and that the files do exist and aren’t empty (size larger than 0 bytes).
How can I get tlt-converter
working so that it outputs a Jetson Nano TRT engine from my exported TLT model? Thank you in advance!