## 0. Set up env variables and map drives 

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TLT experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/ssd`. More information on how to set up the dataset and the supported steps in the TLT workflow are provided in the subsequent cells.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*


In [1]:
# Setting up env variables for cleaner command line commands.
import os

print("Please replace the variable with your key.")
%env KEY=dWh2djZnOWJ1NnBjbDE4bWJuNzJxNDJiM3I6MGIyZDljN2EtZDk3Yi00ZmIwLThjNTctYWQwZmIxNWFhNmIy
%env GPU_INDEX=0
%env USER_EXPERIMENT_DIR=/workspace/tlt-experiments/ssd
%env DATA_DOWNLOAD_DIR=/workspace/tlt-experiments/data

# Please define this local project directory that needs to be mapped to the TLT docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/ssd
%env LOCAL_PROJECT_DIR=/home/dell/tlt-experiments
os.environ["LOCAL_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "ssd")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=/data/tlt-experiments/ssd
# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
 os.getenv("NOTEBOOK_ROOT", os.getcwd()),
 "specs"
)
%env SPECS_DIR=/workspace/tlt-experiments/ssd/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

Please replace the variable with your key.
env: KEY=dWh2djZnOWJ1NnBjbDE4bWJuNzJxNDJiM3I6MGIyZDljN2EtZDk3Yi00ZmIwLThjNTctYWQwZmIxNWFhNmIy
env: GPU_INDEX=0
env: USER_EXPERIMENT_DIR=/workspace/tlt-experiments/ssd
env: DATA_DOWNLOAD_DIR=/workspace/tlt-experiments/data
env: LOCAL_PROJECT_DIR=/home/dell/tlt-experiments
env: SPECS_DIR=/workspace/tlt-experiments/ssd/specs
total 40
-rw-r--r-- 1 dell dell 309 Şub 25 21:15 ssd_tfrecords_kitti_trainval.txt
-rw-r--r-- 1 dell dell 1659 Şub 25 21:15 ssd_retrain_resnet18_kitti.txt
-rw-r--r-- 1 dell dell 513 Mar 12 00:55 augment.txt
-rw-r--r-- 1 dell dell 1351 Mar 12 01:35 ssd_retrain_mobilenet_v2.txt
-rw-r--r-- 1 dell dell 1401 Mar 12 12:13 ssd_train_mobilenet_v2.txt
-rw-r--r-- 1 dell dell 1361 Mar 12 15:45 ssd_train_kitti.txt
-rw-r--r-- 1 dell dell 1412 Mar 12 15:55 ssd_train_resnet18_head.txt
-rw-r--r-- 1 dell dell 1395 Mar 12 17:00 ssd_retrain_resnet18_head.txt
-rw-r--r-- 1 dell dell 1672 Mar 12 18:24 ssd_train_resnet18_kitti.txt
-rw-r--r

In [2]:
# Mapping up the local directories to the TLT docker.
import json
mounts_file = os.path.expanduser("~/.tlt_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
 "Mounts": [
 # Mapping the data directory
 {
 "source": os.environ["LOCAL_PROJECT_DIR"],
 "destination": "/workspace/tlt-experiments"
 },
 # Mapping the specs directory.
 {
 "source": os.environ["LOCAL_SPECS_DIR"],
 "destination": os.environ["SPECS_DIR"]
 },
 ]
}

# Writing the mounts file.
with open(mounts_file, "w") as mfile:
 json.dump(drive_map, mfile, indent=4)

In [3]:
!cat ~/.tlt_mounts.json

{
 "Mounts": [
 {
 "source": "/home/dell/tlt-experiments",
 "destination": "/workspace/tlt-experiments"
 },
 {
 "source": "/home/dell/tlt-experiments/ssd/specs",
 "destination": "/workspace/tlt-experiments/ssd/specs"
 }
 ]
}

In [4]:
# View the versions of the TLT launcher
!tlt info

Configuration of the TLT Instance
dockers: ['nvcr.io/nvidia/tlt-streamanalytics', 'nvcr.io/nvidia/tlt-pytorch']
format_version: 1.0
tlt_version: 3.0
published_date: 02/02/2021


## 2. Prepare dataset and pre-trained model 

 We will be using the KITTI detection dataset for the tutorial. To find more details please visit
 http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download the KITTI detection images (http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and labels (http://www.cvlibs.net/download.php?file=data_object_label_2.zip) to $DATA_DOWNLOAD_DIR.

In [5]:
# verify
!ls -l $LOCAL_DATA_DIR/HEAD

total 8
drwxr-xr-x 4 dell dell 4096 Mar 12 00:31 training
drwxr-xr-x 4 dell dell 4096 Mar 12 00:29 val


In [8]:
# Generate val dataset out of training dataset
!python3.6 generate_val_dataset.py --input_image_dir=$LOCAL_DATA_DIR/HEAD/training/images_300x300 \
 --input_label_dir=$LOCAL_DATA_DIR/HEAD/training/labels_300x300\
 --output_dir=$LOCAL_DATA_DIR/HEAD/val

This script will not run as output image path already exists.


### 2.1 Download pre-trained model 

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [6]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_reg_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_reg_linux.zip
--2021-03-12 11:35:56-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 54.192.233.50, 54.192.233.91, 54.192.233.124, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|54.192.233.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21648112 (21M) [application/zip]
Saving to: ‘/home/dell/tlt-experiments/ngccli/ngccli_reg_linux.zip’


2021-03-12 11:36:05 (2,37 MB/s) - ‘/home/dell/tlt-experiments/ngccli/ngccli_reg_linux.zip’ saved [21648112/21648112]

Archive: /home/dell/tlt-experiments/ngccli/ngccli_reg_linux.zip
 inflating: /home/dell/tlt-experiments/ngccli/ngc 
 inflating: /home/dell/tlt-experiments/ngccli/ngc.md5 


In [7]:
!ngc registry model list nvidia/tlt_pretrained_object_detection:*

+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU | Memor | File | Statu | Creat |
| on | acy | s | Size | Model | y Foo | Size | s | ed |
| | | | | | tprin | | | Date |
| | | | | | t | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| vgg19 | 77.56 | 80 | 1 | V100 | 153.7 | 153.7 | UPLOA | Apr |
| | | | | | | 2 MB | D_COM | 29, |
| | | | | | | | PLETE | 2020 |
| vgg16 | 77.17 | 80 | 1 | V100 | 515.1 | 515.0 | UPLOA | Apr |
| | | | | | | 9 MB | D_COM | 29, |
| | | | | | | | PLETE | 2020 |
| squee | 65.13 | 80 | 1 | V100 | 6.5 | 6.46 | UPLOA | Apr |
| zenet | | | | | | MB | D_COM | 29, |
| | | | | | | | PLETE | 2020 |
| resne | 77.91 | 80 | 1 | V100 | 294.2 | 294.2 | UPLOA | Apr |
| t50 | | | | | | MB | D_COM | 29, |
| | | | | | | | PLETE | 2020 |
| resne | 77.04 | 80 | 1 | V100 | 170.7 | 170.6 | UPLOA | Apr |
| t34 | | | | | | 5 MB | D_COM | 29, |
| | | | | | | | PLETE | 20

In [8]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_mobilenet_v2/

In [10]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tlt_pretrained_object_detection:mobilenet_v2 --dest $LOCAL_EXPERIMENT_DIR/pretrained_mobilenet_v2

Downloaded 4.3 MB in 12s, Download speed: 366.19 KB/s 
----------------------------------------------------
Transfer id: tlt_pretrained_object_detection_vmobilenet_v2 Download status: Completed.
Downloaded local path: /home/dell/tlt-experiments/ssd/pretrained_mobilenet_v2/tlt_pretrained_object_detection_vmobilenet_v2
Total files downloaded: 1 
Total downloaded size: 4.3 MB
Started at: 2021-03-12 11:37:21.732748
Completed at: 2021-03-12 11:37:33.753477
Duration taken: 12s
----------------------------------------------------


In [9]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_EXPERIMENT_DIR/pretrained_mobilenet_v2/tlt_pretrained_object_detection_vmobilenet_v2

Check that model is downloaded into dir.
total 5136
-rw------- 1 dell dell 5258048 Mar 12 11:37 mobilenet_v2.hdf5


## 3. Provide training specification 
* Dataset for the train datasets
 * In order to use the newly generated dataset, update the dataset_config parameter in the spec file at `$LOCAL_SPECS_DIR/ssd_train_resnet18_kitti.txt` 
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.
* Whether to use quantization aware training (QAT)

In [None]:
# To enable QAT training on sample spec file, uncomment following lines
# !sed -i "s/enable_qat: false/enable_qat: true/g" $LOCAL_SPECS_DIR/ssd_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: false/enable_qat: true/g" $LOCAL_SPECS_DIR/ssd_retrain_resnet18_kitti.txt

In [None]:
# By default, the sample spec file disables QAT training. You can force non-QAT training by running lines below
# !sed -i "s/enable_qat: true/enable_qat: false/g" $LOCAL_SPECS_DIR/ssd_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: true/enable_qat: false/g" $LOCAL_SPECS_DIR/ssd_retrain_resnet18_kitti.txt

In [10]:
!cat $LOCAL_SPECS_DIR/ssd_train_mobilenet_v2.txt

random_seed: 42
ssd_config {
 aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 1.0/3.0]"
 scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
 two_boxes_for_ar1: true
 clip_boxes: false
 variances: "[0.1, 0.1, 0.2, 0.2]"
 arch: "mobilenet_v2"
 freeze_bn: false
 freeze_blocks: 0
}
training_config {
 batch_size_per_gpu: 8
 num_epochs: 80
 enable_qat: false
 learning_rate {
 soft_start_annealing_schedule {
 min_learning_rate: 5e-5
 max_learning_rate: 2e-2
 soft_start: 0.15
 annealing: 0.8
 }
 }
 regularizer {
 type: L1
 weight: 3e-5
 }
}
eval_config {
 validation_period_during_training: 5
 average_precision_mode: SAMPLE
 batch_size: 8
 matching_iou_threshold: 0.5
}
nms_config {
 confidence_threshold: 0.01
 clustering_iou_threshold: 0.6
 top_k: 200
}
augmentation_config {
 output_width: 300
 output_height: 300
 output_channel: 3
}
dataset_config {
 data_sources: {
 label_directory_path: "/workspace/tlt-experiments/data/HEAD/training/labels_300x300"
 

## 4. Run TLT training 
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [11]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned_300x300_ssdmobilenet

In [12]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tlt ssd train --gpus 1 --gpu_index=$GPU_INDEX \
 -e $SPECS_DIR/ssd_train_mobilenet_v2.txt \
 -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_300x300_ssdmobilenet \
 -k $KEY \
 -m $USER_EXPERIMENT_DIR/pretrained_mobilenet_v2/tlt_pretrained_object_detection_vmobilenet_v2/mobilenet_v2.hdf5

To run with multigpu, please change --gpus based on the number of available GPUs in your machine.
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.








2021-03-12 16:53:01,191 [INFO] /usr/local/lib/python3.6/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace/tlt-experiments/ssd/specs/ssd_train_mobilenet_v2.txt
2021-03-12 16:53:01,203 [INFO] __main__: Loading pretrained weights. This may take a while...
























Weights for those layers can not be loaded: ['re_lu_0']
STOP trainig now and check the pre-train model if this is not expected!
Initialize optimizer






____________________________________________________________________________________________

In [None]:
print("To resume from checkpoint, please uncomment and run this instead. Change last two arguments accordingly.")
# !tlt ssd train --gpus 1 --gpu_index=$GPU_INDEX \
# -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
# -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
# -k $KEY \
# -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_001.tlt \
# --initial_epoch 2

In [25]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned_300x300/weights

Model for each epoch:
---------------------
total 762M
-rw-r--r-- 1 root root 9,6M Mar 12 12:15 ssd_mobilenet_v2_epoch_001.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:15 ssd_mobilenet_v2_epoch_002.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:15 ssd_mobilenet_v2_epoch_003.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:15 ssd_mobilenet_v2_epoch_004.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:16 ssd_mobilenet_v2_epoch_005.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:16 ssd_mobilenet_v2_epoch_006.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:16 ssd_mobilenet_v2_epoch_007.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:17 ssd_mobilenet_v2_epoch_008.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:17 ssd_mobilenet_v2_epoch_009.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:17 ssd_mobilenet_v2_epoch_010.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:17 ssd_mobilenet_v2_epoch_011.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:18 ssd_mobilenet_v2_epoch_012.tlt
-rw-r--r-- 1 root root 9,6M Mar 12 12:18 ssd_mobilenet_v2_epoch_013.

In [26]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned_300x300/ssd_training_log_mobilenet_v2.csv
%set_env EPOCH=080

epoch,AP_head,loss,lr,mAP,validation_loss
1,nan,71.66014382399297,8.2377446e-05,nan,nan
2,nan,37.671551102657794,0.00013572087,nan,nan
3,nan,23.70247550217057,0.00022360678,nan,nan
4,nan,16.38045983108138,0.00036840313,nan,nan
5,0.006535391776368941,11.930178915992684,0.00060696213,0.006535391776368941,212.00954945882162
6,nan,8.686380714381746,0.0009999998,nan,nan
7,nan,7.464418506839553,0.0016475488,nan,nan
8,nan,7.476165621590234,0.0027144172,nan,nan
9,nan,10.249315744108925,0.004472135,nan,nan
10,5.3225463061528635e-05,94.06979975776412,0.007368061,5.3225463061528635e-05,16427126784.0
11,nan,112.44676658612993,0.012139242,nan,nan
12,nan,109.57277521294179,0.019999994,nan,nan
13,nan,109.23674909726363,0.02,nan,nan
14,nan,108.80930218881247,0.02,nan,nan
15,7.986157327299349e-05,108.54428827029426,0.02,7.986157327299349e-05,132370.72265625
16,nan,108.35668102429506,0.02,nan,nan
17,nan,108.39642492133555,0.02,nan,nan
18,nan,111.07851222218576,0.02,nan,nan
19,nan,118.00245214649105,0.02

## 5. Evaluate trained models 

In [27]:
!tlt ssd evaluate --gpu_index=$GPU_INDEX \
 -e $SPECS_DIR/ssd_train_mobilenet_v2.txt \
 -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned_300x300/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
 -k $KEY

Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
2021-03-12 09:44:06,414 [INFO] /usr/local/lib/python3.6/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace/tlt-experiments/ssd/specs/ssd_train_mobilenet_v2.txt


































Using TLT model for inference, setting batch size to the one in eval_config: 8
Producing predictions: 100%|██████████████████████| 6/6 [00:03<00:00, 1.69it/s]
Start to calculate AP for each class
*******************************
head AP 0.0
 mAP 0.0
*******************************
2021-03-12 12:44:18,727 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


## 6. Prune trained models 
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.5` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned_300x300

In [None]:
!tlt ssd prune --gpu_index=$GPU_INDEX \
 -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned_300x300/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
 -o $USER_EXPERIMENT_DIR/experiment_dir_pruned_300x300/ssd_mobilenet_v2_pruned.tlt \
 -eq intersection \
 -pth 0.1 \
 -k $KEY

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

## 7. Retrain pruned models 
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!cat $LOCAL_SPECS_DIR/ssd_retrain_resnet18_kitti.txt

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights 
!tlt ssd train --gpus 1 --gpu_index=$GPU_INDEX \
 -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
 -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
 -m $USER_EXPERIMENT_DIR/experiment_dir_pruned/ssd_resnet18_pruned.tlt \
 -k $KEY

In [None]:
# Listing the newly retrained model.
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/ssd_training_log_resnet18.csv
%set_env EPOCH=080

## 8. Evaluate retrained model 

In [None]:
!tlt ssd evaluate --gpu_index=$GPU_INDEX \
 -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
 -k $KEY

## 9. Visualize inferences 
In this section, we run the tlt-infer tool to generate inferences on the trained models and visualize the results.

In [None]:
# Copy some test images
!mkdir -p $LOCAL_DATA_DIR/test_samples
!cp $LOCAL_DATA_DIR/testing/image_2/00000* $LOCAL_DATA_DIR/test_samples

In [None]:
# Running inference for detection on n images
!tlt ssd inference --gpu_index=$GPU_INDEX -i $DATA_DOWNLOAD_DIR/test_samples \
 -o $USER_EXPERIMENT_DIR/ssd_infer_images \
 -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
 -l $USER_EXPERIMENT_DIR/ssd_infer_labels \
 -k $KEY

The `tlt` inference tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/ssd_infer_images`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/ssd_infer_labels`

In [None]:
# Simple grid visualizer
!pip3 install matplotlib==3.3.3
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
 output_path = os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], image_dir)
 num_rows = int(ceil(float(num_images) / float(num_cols)))
 f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
 f.tight_layout()
 a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
 if os.path.splitext(image)[1].lower() in valid_image_ext]
 for idx, img_path in enumerate(a[:num_images]):
 col_id = idx % num_cols
 row_id = idx // num_cols
 img = plt.imread(img_path)
 axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
OUTPUT_PATH = 'ssd_infer_images' # relative path from $USER_EXPERIMENT_DIR.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 10. Model Export 

If you trained a non-QAT model, you may export in FP32, FP16 or INT8 mode using the code block below. For INT8, you need to provide calibration image directory.

In [None]:
# tlt-export will fail if .etlt already exists. So we clear the export folder before tlt-export
!rm -rf $LOCAL_EXPERIMENT_DIR/export
!mkdir -p $LOCAL_EXPERIMENT_DIR/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!tlt ssd export --gpu_index=$GPU_INDEX \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
 -k $KEY \
 -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
 -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
 --batch_size 16 \
 --data_type fp32

# Uncomment to export in INT8 mode (generate calibration cache file).
# !tlt ssd export --gpu_index=$GPU_INDEX \
# -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
# -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
# -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
# -k $KEY \
# --cal_image_dir $USER_EXPERIMENT_DIR/data/testing/image_2 \
# --data_type int8 \
# --batch_size 16 \
# --batches 10 \
# --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
# --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

`Note:` In this example, for ease of execution we restrict the number of calibrating batches to 10. TLT recommends the use of at least 10% of the training dataset for int8 calibration.

If you train a QAT model, you may only export in INT8 mode using following code block. This generates an etlt file and the corresponding calibration cache. You can throw away the calibration cache and just use the etlt file in tlt-converter or DeepStream for FP32 or FP16 mode. But please note this gives sub-optimal results. If you want to deploy in FP32 or FP16, you should disable QAT in training.

In [None]:
# Uncomment to export QAT model in INT8 mode (generate calibration cache file).
# !rm -rf $LOCAL_EXPERIMENT_DIR/export
# !mkdir -p $LOCAL_EXPERIMENT_DIR/export
# !tlt ssd export --gpu_index=$GPU_INDEX \
# -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
# -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
# -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
# -k $KEY \
# --data_type int8 \
# --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin

In [None]:
print('Exported model:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/export

Verify engine generation using the `tlt-converter` utility included with the docker.

The `tlt-converter` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please instantiate this docker and execute the `tlt-converter` command, with the exported `.etlt` file and calibration cache (for int8 mode) on your target device. The converter utility included in this docker only works for x86 devices, with discrete NVIDIA GPU's. 

For the jetson devices, please download the converter for jetson from the dev zone link [here](https://developer.nvidia.com/tlt-converter). 

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.etlt` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

In [None]:
# Convert to TensorRT engine (FP32)
!tlt tlt-converter -k $KEY \
 -d 3,300,300 \
 -o NMS \
 -e $USER_EXPERIMENT_DIR/export/trt.engine \
 -m 16 \
 -t fp32 \
 -i nchw \
 $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt

# Convert to TensorRT engine (FP16)
# !tlt tlt-converter -k $KEY \
# -d 3,300,300 \
# -o NMS \
# -e $USER_EXPERIMENT_DIR/export/trt.engine \
# -m 16 \
# -t fp16 \
# -i nchw \
# $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt

# Convert to TensorRT engine (INT8).
# !tlt tlt-converter -k $KEY \
# -d 3,300,300 \
# -o NMS \
# -c $USER_EXPERIMENT_DIR/export/cal.bin \
# -e $USER_EXPERIMENT_DIR/export/trt.engine \
# -b 8 \
# -m 16 \
# -t int8 \
# -i nchw \
# $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt

In [None]:
print('Exported engine:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/export/trt.engine

## 11. Verify the deployed model 
Verify the converted engine by visualizing TensorRT inferences.

In [None]:
# Infer using TensorRT engine

# The engine batch size once created, cannot be alterred. So if you wish to run with a different batch-size,
# please re-run tlt-convert.

!tlt ssd inference --gpu_index=$GPU_INDEX \
 -m $USER_EXPERIMENT_DIR/export/trt.engine \
 -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
 -i $DATA_DOWNLOAD_DIR/test_samples \
 -o $USER_EXPERIMENT_DIR/ssd_infer_images \
 -t 0.4

In [None]:
# Visualizing the sample images.
OUTPUT_PATH = 'ssd_infer_images' # relative path from $USER_EXPERIMENT_DIR.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)