# Object Detection using TAO DetectNet_v2

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

 

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained resnet18 model and train a ResNet-18 DetectNet_v2 model on the KITTI dataset
* Prune the trained detectnet_v2 model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Quantize the pruned model using QAT
* Run Inference on the trained model
* Export the pruned, quantized and retrained model to a .etlt file for deployment to DeepStream
* Run inference on the exported. etlt model to verify deployment using TensorRT

### Table of Contents

This notebook shows an example usecase of Object Detection using DetectNet_v2 in the Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Install the TAO Launcher](#head-1)
1. [Prepare dataset and pre-trained model](#head-2)
 1. [Download the dataset](#head-2-1)
 1. [Verify downloaded dataset](#head-2-2)
 1. [Prepare tfrecords from kitti format dataset](#head-2-3)
 2. [Download pre-trained model](#head-2-4)
2. [Provide training specification](#head-3)
3. [Run TAO training](#head-4)
4. [Evaluate trained models](#head-5)
5. [Prune trained models](#head-6)
6. [Retrain pruned models](#head-7)
7. [Evaluate retrained model](#head-8)
8. [Visualize inferences](#head-9)
9. [Model Export](#head-10)
 1. [Int8 Optimization](#head-10-1)
 2. [Generate TensorRT engine](#head-10-2)
10. [Verify Deployed Model](#head-11)
 1. [Inference using TensorRT engine](#head-11-1)
11. [QAT workflow](#head-12)
 1. [Convert pruned model to QAT and retrain](#head-12-1)
 2. [Evaluate QAT converted model](#head-12-2)
 3. [Export QAT trained model to int8](#head-12-3)
 4. [Evaluate a QAT trained model using the exported TensorRT engine](#head-12-4)
 5. [Inference using QAT engine](#head-12-5)

## 0. Set up env variables and map drives 
When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/detectnet_v2`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

*Note: This notebook currently is by default set up to run training using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` accordingly*

In [1]:
pwd

'/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2_car'

In [2]:
# Setting up env variables for cleaner command line commands.
import os

# %env KEY=nvidia_tlt
%env KEY=tlt_encode
%env NUM_GPUS=1
%env USER_EXPERIMENT_DIR=/workspace/tao-experiments/experiment
%env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/detectnet_v2

# Please define this local project directory that needs to be mapped to the TAO docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/detectnet_v2
# !PLEASE MAKE SURE TO UPDATE THIS PATH!.

%env LOCAL_PROJECT_DIR =/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2_car
# os.environ["LOCAL_PROJECT_DIR"] = FIXME

os.environ["LOCAL_DATA_DIR"] = os.path.join(
 os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
 "data"
)
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(
 os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
 "experiment"
)

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
 os.getenv("NOTEBOOK_ROOT", os.getcwd()),
 "specs"
)
%env SPECS_DIR=/workspace/tao-experiments/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

env: KEY=tlt_encode
env: NUM_GPUS=1
env: USER_EXPERIMENT_DIR=/workspace/tao-experiments/experiment
env: DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data
env: LOCAL_PROJECT_DIR=/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2_car
env: SPECS_DIR=/workspace/tao-experiments/specs
total 28
-rw-r--r-- 1 guest guest 2436 Nov 18 10:00 detectnet_v2_inference_kitti_etlt.txt
-rw-r--r-- 1 guest guest 2445 Nov 18 10:00 detectnet_v2_inference_kitti_etlt_qat.txt
-rw-rw-r-- 1 guest guest 3172 Jan 6 15:56 detectnet_v2_train_resnet18_kitti.txt
-rw-rw-r-- 1 guest guest 370 Jan 6 16:06 detectnet_v2_tfrecords_kitti_trainval.txt
-rw-r--r-- 1 guest guest 3172 Jan 6 16:17 detectnet_v2_retrain_resnet18_kitti_qat.txt
-rw-rw-r-- 1 guest guest 3172 Jan 6 16:19 detectnet_v2_retrain_resnet18_kitti.txt
-rw-r--r-- 1 guest guest 990 Jan 20 10:38 detectnet_v2_inference_kitti_tlt.txt


The cell below maps the project directory on your local host to a workspace directory in the TAO docker instance, so that the data and the results are mapped from in and out of the docker. For more information please refer to the [launcher instance](https://docs.nvidia.com/tao/tao-toolkit/tao_launcher.html) in the user guide.

When running this cell on AWS, update the drive_map entry with the dictionary defined below, so that you don't have permission issues when writing data into folders created by the TAO docker.

```json
drive_map = {
 "Mounts": [
 # Mapping the data directory
 {
 "source": os.environ["LOCAL_PROJECT_DIR"],
 "destination": "/workspace/tao-experiments"
 },
 # Mapping the specs directory.
 {
 "source": os.environ["LOCAL_SPECS_DIR"],
 "destination": os.environ["SPECS_DIR"]
 },
 ],
 "DockerOptions": {
 "user": "{}:{}".format(os.getuid(), os.getgid())
 }
}
```

In [3]:
# Mapping up the local directories to the TAO docker.
import json
mounts_file = os.path.expanduser("~/.tao_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
 "Mounts": [
 # Mapping the data directory
 {
 "source": os.environ["LOCAL_PROJECT_DIR"],
 "destination": "/workspace/tao-experiments"
 },
 # Mapping the specs directory.
 {
 "source": os.environ["LOCAL_SPECS_DIR"],
 "destination": os.environ["SPECS_DIR"]
 },
 ],
 "DockerOptions": {
 "user": "{}:{}".format(os.getuid(), os.getgid())
 }
}

# Writing the mounts file.
with open(mounts_file, "w") as mfile:
 json.dump(drive_map, mfile, indent=4)

In [4]:
!cat ~/.tao_mounts.json

{
 "Mounts": [
 {
 "source": "/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2_car",
 "destination": "/workspace/tao-experiments"
 },
 {
 "source": "/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2_car/specs",
 "destination": "/workspace/tao-experiments/specs"
 }
 ],
 "DockerOptions": {
 "user": "1001:1001"
 }
}

## 1. Install the TAO launcher 
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.6.9 < 3.8.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be trigerred to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

In [5]:
# SKIP this step IF you have already installed the TAO launcher wheel.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [5]:
# View the versions of the TAO launcher
!tao info

Configuration of the TAO Toolkit Instance
dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021


## 2. Prepare dataset and pre-trained model 

We will be using the kitti object detection dataset for this example. To find more details, please visit http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download both, the left color images of the object dataset from [here](http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and, the training labels for the object dataset from [here](http://www.cvlibs.net/download.php?file=data_object_label_2.zip), and place the zip files in `$LOCAL_DATA_DIR`

The data will then be extracted to have
* training images in `$LOCAL_DATA_DIR/training/image_2`
* training labels in `$LOCAL_DATA_DIR/training/label_2`
* testing images in `$LOCAL_DATA_DIR/testing/image_2`

You may use this notebook with your own dataset as well. To use this example with your own dataset, please follow the same directory structure as mentioned below.

*Note: There are no labels for the testing images, therefore we use it just to visualize inferences for the trained model.*

### A. Download the dataset 
Once you have gotten the download links in your email, please populate them in place of the `KITTI_IMAGES_DOWNLOAD_URL` and the `KITTI_LABELS_DOWNLOAD_URL`. This next cell, will download the data and place in `$LOCAL_DATA_DIR`

In [5]:
!ls -l $LOCAL_DATA_DIR/

total 8
drwxrwxr-x 3 guest guest 4096 Jan 6 16:06 tfrecords
drwxrwxr-x 8 guest guest 4096 Dec 21 15:15 training


In [6]:
import os
!mkdir -p $LOCAL_DATA_DIR
os.environ["URL_IMAGES"]=KITTI_IMAGES_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then wget $URL_IMAGES -O $LOCAL_DATA_DIR/data_object_image_2.zip; else echo "image archive already downloaded"; fi 
os.environ["URL_LABELS"]=KITTI_LABELS_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then wget $URL_LABELS -O $LOCAL_DATA_DIR/data_object_label_2.zip; else \ echo "label archive already downloaded"; fi 

NameError: name 'KITTI_IMAGES_DOWNLOAD_URL' is not defined

### B. Verify downloaded dataset 

In [14]:
# Check the dataset is present
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi

Found Image zip file.
Found Labels zip file.


In [15]:
# This may take a while: verify integrity of zip files 
!sha256sum $LOCAL_DATA_DIR/data_object_image_2.zip | cut -d ' ' -f 1 | grep -xq '^351c5a2aa0cd9238b50174a3a62b846bc5855da256b82a196431d60ff8d43617$' ; \
if test $? -eq 0; then echo "images OK"; else echo "images corrupt, redownload!" && rm -f $LOCAL_DATA_DIR/data_object_image_2.zip; fi 
!sha256sum $LOCAL_DATA_DIR/data_object_label_2.zip | cut -d ' ' -f 1 | grep -xq '^4efc76220d867e1c31bb980bbf8cbc02599f02a9cb4350effa98dbb04aaed880$' ; \
if test $? -eq 0; then echo "labels OK"; else echo "labels corrupt, redownload!" && rm -f $LOCAL_DATA_DIR/data_object_label_2.zip; fi 

images corrupt, redownload!
labels corrupt, redownload!


In [8]:
DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
print(DATA_DIR)

/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/car_data


In [7]:
# verify
import os

DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
num_training_images = len(os.listdir(os.path.join(DATA_DIR, "training/image_2")))
num_training_labels = len(os.listdir(os.path.join(DATA_DIR, "training/label_2")))
num_testing_images = len(os.listdir(os.path.join(DATA_DIR, "testing/images")))
print("Number of images in the train/val set. {}".format(num_training_images))
print("Number of labels in the train/val set. {}".format(num_training_labels))
print("Number of images in the test set. {}".format(num_testing_images))

FileNotFoundError: [Errno 2] No such file or directory: '/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/car_data/testing/images'

In [9]:
# Sample kitti label.
!cat $LOCAL_DATA_DIR/training/label_2/1.txt

cat: /home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/car_data/training/label_2/1.txt: No such file or directory


### C. Prepare tf records from kitti format dataset 

* Update the tfrecords spec file to take in your kitti format dataset
* Create the tfrecords using the detectnet_v2 dataset_convert 

*Note: TfRecords only need to be generated once.*

In [7]:
print("TFrecords conversion spec file for kitti training")
!cat $LOCAL_SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt

TFrecords conversion spec file for kitti training
kitti_config {
 root_directory_path: "/workspace/tao-experiments/data/training"
 image_dir_name: "image_2"
 label_dir_name: "label_2"
 image_extension: ".png"
 partition_mode: "random"
 num_partitions: 2
 val_split: 20
 num_shards: 10
}
image_directory_path: "/workspace/tao-experiments/car_data/training"
target_class_mapping {
 key: "car"
 value: "car"
}

In [25]:
# Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao detectnet_v2 dataset_convert \
 -d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
 -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

Converting Tfrecords for kitti trainval dataset
2022-01-06 16:06:11,395 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-6_mb0qvf because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.
2022-01-06 08:06:16,899 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2022-01-06 08:06:16,900 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Creating output directory /workspace/tao-experiments/data/tfrecords/kitti_trainval
2022-01-06 08:06:16,903 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 761	Val: 190
2022-01-06 08:06:16,903 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hen

In [26]:
!ls -rlt $LOCAL_DATA_DIR/tfrecords/kitti_trainval/

total 724
-rw-r--r-- 1 guest guest 14257 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00000-of-00010
-rw-r--r-- 1 guest guest 14490 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00001-of-00010
-rw-r--r-- 1 guest guest 14491 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00002-of-00010
-rw-r--r-- 1 guest guest 14084 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00003-of-00010
-rw-r--r-- 1 guest guest 14492 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00004-of-00010
-rw-r--r-- 1 guest guest 14259 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00005-of-00010
-rw-r--r-- 1 guest guest 13449 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00006-of-00010
-rw-r--r-- 1 guest guest 13564 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00007-of-00010
-rw-r--r-- 1 guest guest 14433 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00008-of-00010
-rw-r--r-- 1 guest guest 13392 Jan 6 16:06 kitti_trainval-fold-000-of-002-shard-00009-of-00010
-rw-r--r-- 1 guest guest 5605

### D. Download pre-trained model 
Download the correct pretrained model from the NGC model registry for your experiment. Please note that for DetectNet_v2, the input is expected to be 0-1 normalized with input channels in RGB order. Therefore, for optimum results please download model templates from `nvidia/tao/pretrained_detectnet_v2`. The templates are now organized as version strings. For example, to download a resnet18 model suitable for detectnet please resolve to the ngc object shown as `nvidia/tao/pretrained_detectnet_v2:resnet18`. 

All other models are in BGR order expect input preprocessing with mean subtraction and input channels. Using them as pretrained weights may result in suboptimal performance.

You may also use this notebook with the following purpose-built pretrained models 
* [PeopleNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplenet)
* [TrafficCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:trafficcamnet)
* [DashCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:dashcamnet)
* [FaceDetect-IR](https://ngc.nvidia.com/catalog/models/nvidia:tao:facedetectir) 

In [33]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_cat_linux.zip
--2021-12-13 17:12:19-- https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 13.225.99.60, 13.225.99.28, 13.225.99.8, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.99.60|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25122731 (24M) [application/zip]
Saving to: ‘/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/ngccli/ngccli_cat_linux.zip’


2021-12-13 17:12:21 (10.4 MB/s) - ‘/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/ngccli/ngccli_cat_linux.zip’ saved [25122731/25122731]

Archive: /home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/ngccli/ngccli_cat_linux.zip
 inflating: /home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/ngccli/ngc 
 extracting: /home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/ngccli/ngc.md5 


In [34]:
# List models available in the model registry.
!ngc registry model list nvidia/tao/pretrained_detectnet_v2:*

+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU | Memor | File | Statu | Creat |
| on | acy | s | Size | Model | y Foo | Size | s | ed |
| | | | | | tprin | | | Date |
| | | | | | t | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| vgg19 | 82.6 | 80 | 1 | V100 | 153.8 | 153.7 | UPLOA | Aug |
| | | | | | | 7 MB | D_COM | 24, |
| | | | | | | | PLETE | 2021 |
| vgg16 | 82.2 | 80 | 1 | V100 | 113.2 | 113.2 | UPLOA | Aug |
| | | | | | | MB | D_COM | 24, |
| | | | | | | | PLETE | 2021 |
| squee | 65.67 | 80 | 1 | V100 | 6.5 | 6.46 | UPLOA | Aug |
| zenet | | | | | | MB | D_COM | 24, |
| | | | | | | | PLETE | 2021 |
| resne | 82.7 | 80 | 1 | V100 | 294.5 | 294.5 | UPLOA | Aug |
| t50 | | | | | | 3 MB | D_COM | 24, |
| | | | | | | | PLETE | 2021 |
| resne | 79.0 | 80 | 1 | V100 | 89.0 | 89.02 | UPLOA | Aug |
| t18 | | | | | | MB | D_COM | 24, |
| | | | | | | | PLETE | 2021 |
|

In [35]:
# Create the target destination to download the model.
!mkdir $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/

mkdir: cannot create directory ‘/home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/detectnet_v2_car/pretrained_resnet18/’: File exists


In [36]:
# Download the pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_detectnet_v2:resnet18 \
 --dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet18

Downloaded 82.28 MB in 40s, Download speed: 2.05 MB/s 
----------------------------------------------------
Transfer id: pretrained_detectnet_v2_vresnet18 Download status: Completed.
Downloaded local path: /home/guest/taotoolkit/cv_samples_v1.2.0/detectnet_v2/detectnet_v2_car/pretrained_resnet18/pretrained_detectnet_v2_vresnet18-2
Total files downloaded: 1 
Total downloaded size: 82.28 MB
Started at: 2021-12-13 17:12:47.431104
Completed at: 2021-12-13 17:13:27.485671
Duration taken: 40s
----------------------------------------------------


In [14]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/pretrained_detectnet_v2_vresnet18

total 91160
-rw------- 1 guest guest 93345248 Dec 13 17:07 resnet18.hdf5


## 3. Provide training specification 
* Tfrecords for the train datasets
 * To use the newly generated tfrecords, update the dataset_config parameter in the spec file at `$SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt` 
 * Update the fold number to use for evaluation. In case of random data split, please use fold `0` only
 * For sequence-wise split, you may use any fold generated from the dataset convert tool
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [7]:
!cat $LOCAL_SPECS_DIR/detectnet_v2_train_resnet18_kitti_car.txt

random_seed: 42
dataset_config {
 data_sources {
 tfrecords_path: "/workspace/tao-experiments/car_data/tfrecords/kitti_trainval/*"
 image_directory_path: "/workspace/tao-experiments/car_data/training/"
 }
 image_extension: "png"
 target_class_mapping{
 key:"car"
 value:"car"
 }
 validation_fold: 0
}
augmentation_config {
 preprocessing {
 output_image_width: 960
 output_image_height: 544
 min_bbox_width: 1.0
 min_bbox_height: 1.0
 output_image_channel: 3
 enable_auto_resize: true
 }
 spatial_augmentation {
 hflip_probability: 0.5
 vflip_probability: 0.0
 zoom_min: 1.0
 zoom_max: 1.0
 translate_max_x: 8.0
 translate_max_y: 8.0
 }
 color_augmentation {
 hue_rotation_max: 25.0
 saturation_shift_max: 0.20000000298
 contrast_scale_max: 0.10000000149
 contrast_center: 0.5
 }
}

postprocessing_config {
 target_class_config {
 key: "car"
 value {
 clustering_config {
 clustering_algorithm: DBSCAN
 coverage_threshold: 0.005
 dbscan_eps: 0.15
 dbscan

## 4. Run TAO training 
* Provide the sample spec file and the output directory location for models

*Note: The training may take hours to complete. Also, the remaining notebook, assumes that the training was done in single-GPU mode. When run in multi-GPU mode, please expect to update the pruning and inference steps with new pruning thresholds and updated parameters in the clusterfile.json accordingly for optimum performance.*

*Detectnet_v2 now supports restart from checkpoint. Incase the training job is killed prematurely, you may resume training from the closest checkpoint by simply re-running the **same** command line. Please do make sure to use the **same number of GPUs** when restarting the training.*

*When running the training with NUM_GPUs>1, you may need to modify the `batc_size_per_gpu` and `learning_rate` to get similar mAP as a 1GPU training run. In most cases, scaling down the batch-size by a factor of NUM_GPU's or scaling up the learning rate by a factor of NUM_GPU's would be a good place to start.* 

In [1]:
 !$SPECS_DIR

In [None]:
!tao detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
 -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
 -k tlt_encode \
 -n resnet18_detector \
 --gpus $NUM_GPUS

2021-12-30 17:07:03,018 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-8goo7gpw because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.








2021-12-30 09:07:09,036 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/detectnet_v2_car/specs/detectnet_v2_train_resnet18_kitti_car.txt.
2021-12-30 09:07:09,037 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/detectnet_v2_car/specs/detectnet_v2_train_resnet18_kitti_car.txt
2021-12-30 09:07:09,151 [INFO] __main__: Cannot iterate over exactly 761 samples with a batch size of 8; each epoch will therefore take one extra step.


















2021-12-30 09:

2021-12-30 09:07:16,957 [INFO] iva.detectnet_v2.model.detectnet_model: Converting the keras model to quantize keras model.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to 
input_1 (InputLayer) (None, 3, 544, 960) 0 
__________________________________________________________________________________________________
input_1_qdq (QDQ) (None, 3, 544, 960) 1 input_1[0][0] 
__________________________________________________________________________________________________
conv1 (QuantizedConv2D) (None, 64, 272, 480) 9472 input_1_qdq[0][0] 
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0] 
__________________________________________________________________________________________________
activation_1 (ReLU) (None, 64, 272, 480) 0 bn_conv1[0][0] 
_______________________________________



2021-12-30 09:07:41,248 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-12-30 09:07:41,450 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2021-12-30 09:07:41,456 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2021-12-30 09:07:41,456 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000


2021-12-30 09:07:41,762 [INFO] __main__: Found 761 samples in training set














2021-12-30 09:07:43,967 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2021-12-30 09:07:43,967 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2021-12-30 09:07:43,967 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2021-12-30 09:07:43,967 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: 4
2021-12-30 09:07:43,968 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 190, number of sources: 1, batch size per gpu: 8, steps: 24
2021-12-30 09:07:43,992 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-12-30 09:07:44,187 [INFO] modulus.blocks.data_loaders.multi

INFO:tensorflow:Graph was finalized.
2021-12-30 09:07:47,404 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2021-12-30 09:07:49,205 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-12-30 09:07:49,736 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2021-12-30 09:07:58,014 [INFO] tensorflow: Saving checkpoints for step-0.
INFO:tensorflow:epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.058969934, step = 0
2021-12-30 09:09:54,118 [INFO] tensorflow: epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.058969934, step = 0
2021-12-30 09:09:54,120 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 0/120: loss: 0.05897 learning rate: 0.00000 Time taken: 0:00:00 ETA: 0:00:00
2021-12-30 09:09:54,120 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.091
INFO:tensorflow:global_step/sec: 2.07329
2021-12-30 09:09:58,460 [INFO] tensorflow: global_st

INFO:tensorflow:epoch = 1.375, learning_rate = 9.175495e-06, loss = 0.0029616202, step = 132 (5.456 sec)
2021-12-30 09:10:37,781 [INFO] tensorflow: epoch = 1.375, learning_rate = 9.175495e-06, loss = 0.0029616202, step = 132 (5.456 sec)
INFO:tensorflow:global_step/sec: 3.12555
2021-12-30 09:10:38,728 [INFO] tensorflow: global_step/sec: 3.12555
INFO:tensorflow:global_step/sec: 3.13134
2021-12-30 09:10:41,603 [INFO] tensorflow: global_step/sec: 3.13134
INFO:tensorflow:epoch = 1.5520833333333333, learning_rate = 9.92169e-06, loss = 0.0019230827, step = 149 (5.429 sec)
2021-12-30 09:10:43,210 [INFO] tensorflow: epoch = 1.5520833333333333, learning_rate = 9.92169e-06, loss = 0.0019230827, step = 149 (5.429 sec)
2021-12-30 09:10:43,210 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.003
INFO:tensorflow:global_step/sec: 3.11009
2021-12-30 09:10:44,496 [INFO] tensorflow: global_step/sec: 3.11009
INFO:tensorflow:global_step/sec: 3.09484
2021-12-30 09:10:47,404 [INFO] tensorflo

INFO:tensorflow:epoch = 4.03125, learning_rate = 2.9646415e-05, loss = 0.0008774643, step = 387 (5.533 sec)
2021-12-30 09:12:00,118 [INFO] tensorflow: epoch = 4.03125, learning_rate = 2.9646415e-05, loss = 0.0008774643, step = 387 (5.533 sec)
INFO:tensorflow:global_step/sec: 3.04534
2021-12-30 09:12:00,118 [INFO] tensorflow: global_step/sec: 3.04534
INFO:tensorflow:global_step/sec: 3.12508
2021-12-30 09:12:02,998 [INFO] tensorflow: global_step/sec: 3.12508
2021-12-30 09:12:03,971 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.732
INFO:tensorflow:epoch = 4.208333333333333, learning_rate = 3.2057404e-05, loss = 0.00095430797, step = 404 (5.478 sec)
2021-12-30 09:12:05,596 [INFO] tensorflow: epoch = 4.208333333333333, learning_rate = 3.2057404e-05, loss = 0.00095430797, step = 404 (5.478 sec)
INFO:tensorflow:global_step/sec: 3.06631
2021-12-30 09:12:05,933 [INFO] tensorflow: global_step/sec: 3.06631
INFO:tensorflow:global_step/sec: 3.09654
2021-12-30 09:12:08,840 [INFO]

INFO:tensorflow:epoch = 6.6875, learning_rate = 9.578882e-05, loss = 0.0009072368, step = 642 (5.445 sec)
2021-12-30 09:13:22,586 [INFO] tensorflow: epoch = 6.6875, learning_rate = 9.578882e-05, loss = 0.0009072368, step = 642 (5.445 sec)
INFO:tensorflow:global_step/sec: 3.08532
2021-12-30 09:13:24,543 [INFO] tensorflow: global_step/sec: 3.08532
2021-12-30 09:13:24,881 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.805
INFO:tensorflow:global_step/sec: 3.10785
2021-12-30 09:13:27,439 [INFO] tensorflow: global_step/sec: 3.10785
INFO:tensorflow:epoch = 6.864583333333333, learning_rate = 0.00010357884, loss = 0.0005257707, step = 659 (5.514 sec)
2021-12-30 09:13:28,100 [INFO] tensorflow: epoch = 6.864583333333333, learning_rate = 0.00010357884, loss = 0.0005257707, step = 659 (5.514 sec)
INFO:tensorflow:global_step/sec: 3.09983
2021-12-30 09:13:30,343 [INFO] tensorflow: global_step/sec: 3.09983
2021-12-30 09:13:32,300 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor

INFO:tensorflow:global_step/sec: 3.10518
2021-12-30 09:14:43,164 [INFO] tensorflow: global_step/sec: 3.10518
INFO:tensorflow:epoch = 9.34375, learning_rate = 0.00030949776, loss = 0.00034103278, step = 897 (5.463 sec)
2021-12-30 09:14:45,078 [INFO] tensorflow: epoch = 9.34375, learning_rate = 0.00030949776, loss = 0.00034103278, step = 897 (5.463 sec)
2021-12-30 09:14:45,713 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.955
INFO:tensorflow:global_step/sec: 3.13285
2021-12-30 09:14:46,037 [INFO] tensorflow: global_step/sec: 3.13285
INFO:tensorflow:global_step/sec: 3.07157
2021-12-30 09:14:48,967 [INFO] tensorflow: global_step/sec: 3.07157
INFO:tensorflow:epoch = 9.520833333333332, learning_rate = 0.0003346676, loss = 0.0004047746, step = 914 (5.546 sec)
2021-12-30 09:14:50,624 [INFO] tensorflow: epoch = 9.520833333333332, learning_rate = 0.0003346676, loss = 0.0004047746, step = 914 (5.546 sec)
INFO:tensorflow:global_step/sec: 3.04281
2021-12-30 09:14:51,925 [INFO] t

INFO:tensorflow:global_step/sec: 3.03403
2021-12-30 09:16:05,446 [INFO] tensorflow: global_step/sec: 3.03403
INFO:tensorflow:global_step/sec: 3.03002
2021-12-30 09:16:08,417 [INFO] tensorflow: global_step/sec: 3.03002
INFO:tensorflow:epoch = 11.947916666666666, learning_rate = 0.0009772663, loss = 0.0003278718, step = 1147 (5.580 sec)
2021-12-30 09:16:09,705 [INFO] tensorflow: epoch = 11.947916666666666, learning_rate = 0.0009772663, loss = 0.0003278718, step = 1147 (5.580 sec)
2021-12-30 09:16:10,339 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.416
INFO:tensorflow:global_step/sec: 3.1438
2021-12-30 09:16:11,280 [INFO] tensorflow: global_step/sec: 3.1438
2021-12-30 09:16:11,280 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 12/120: loss: 0.00029 learning rate: 0.00100 Time taken: 0:00:31.128277 ETA: 0:56:01.853871
INFO:tensorflow:global_step/sec: 3.07406
2021-12-30 09:16:14,207 [INFO] tensorflow: global_step/sec: 3.07406
INFO:tensorflow:epoch = 1

INFO:tensorflow:global_step/sec: 3.16184
2021-12-30 09:17:26,901 [INFO] tensorflow: global_step/sec: 3.16184
INFO:tensorflow:global_step/sec: 3.10493
2021-12-30 09:17:29,800 [INFO] tensorflow: global_step/sec: 3.10493
2021-12-30 09:17:31,053 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.125
INFO:tensorflow:epoch = 14.604166666666666, learning_rate = 0.0009999999, loss = 0.00024648634, step = 1402 (5.454 sec)
2021-12-30 09:17:32,046 [INFO] tensorflow: epoch = 14.604166666666666, learning_rate = 0.0009999999, loss = 0.00024648634, step = 1402 (5.454 sec)
INFO:tensorflow:global_step/sec: 3.08606
2021-12-30 09:17:32,716 [INFO] tensorflow: global_step/sec: 3.08606
INFO:tensorflow:global_step/sec: 3.11472
2021-12-30 09:17:35,606 [INFO] tensorflow: global_step/sec: 3.11472
INFO:tensorflow:epoch = 14.78125, learning_rate = 0.0009999999, loss = 0.00022511775, step = 1419 (5.496 sec)
2021-12-30 09:17:37,543 [INFO] tensorflow: epoch = 14.78125, learning_rate = 0.0009999999, lo

INFO:tensorflow:epoch = 17.083333333333332, learning_rate = 0.0009999999, loss = 0.00025058712, step = 1640 (5.510 sec)
2021-12-30 09:18:49,100 [INFO] tensorflow: epoch = 17.083333333333332, learning_rate = 0.0009999999, loss = 0.00025058712, step = 1640 (5.510 sec)
INFO:tensorflow:global_step/sec: 3.14287
2021-12-30 09:18:51,317 [INFO] tensorflow: global_step/sec: 3.14287
2021-12-30 09:18:51,961 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.858
INFO:tensorflow:global_step/sec: 3.06136
2021-12-30 09:18:54,257 [INFO] tensorflow: global_step/sec: 3.06136
INFO:tensorflow:epoch = 17.260416666666664, learning_rate = 0.0009999999, loss = 0.0002757479, step = 1657 (5.465 sec)
2021-12-30 09:18:54,564 [INFO] tensorflow: epoch = 17.260416666666664, learning_rate = 0.0009999999, loss = 0.0002757479, step = 1657 (5.465 sec)
INFO:tensorflow:global_step/sec: 3.1104
2021-12-30 09:18:57,151 [INFO] tensorflow: global_step/sec: 3.1104
INFO:tensorflow:epoch = 17.4375, learning_rate = 

INFO:tensorflow:epoch = 19.739583333333332, learning_rate = 0.0009999999, loss = 0.0002879659, step = 1895 (5.474 sec)
2021-12-30 09:20:11,550 [INFO] tensorflow: epoch = 19.739583333333332, learning_rate = 0.0009999999, loss = 0.0002879659, step = 1895 (5.474 sec)
INFO:tensorflow:global_step/sec: 3.04048
2021-12-30 09:20:12,892 [INFO] tensorflow: global_step/sec: 3.04048
2021-12-30 09:20:12,892 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.663
INFO:tensorflow:global_step/sec: 3.17794
2021-12-30 09:20:15,724 [INFO] tensorflow: global_step/sec: 3.17794
INFO:tensorflow:epoch = 19.916666666666664, learning_rate = 0.0009999999, loss = 0.00027038585, step = 1912 (5.450 sec)
2021-12-30 09:20:17,000 [INFO] tensorflow: epoch = 19.916666666666664, learning_rate = 0.0009999999, loss = 0.00027038585, step = 1912 (5.450 sec)
INFO:tensorflow:global_step/sec: 3.10279
2021-12-30 09:20:18,624 [INFO] tensorflow: global_step/sec: 3.10279
INFO:tensorflow:Saving checkpoints for step-192

2021-12-30 09:22:20,564 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 22/120: loss: 0.00027 learning rate: 0.00100 Time taken: 0:00:30.983797 ETA: 0:50:36.412137
INFO:tensorflow:global_step/sec: 3.06592
2021-12-30 09:22:21,535 [INFO] tensorflow: global_step/sec: 3.06592
INFO:tensorflow:epoch = 22.125, learning_rate = 0.0009999999, loss = 0.0002627346, step = 2124 (5.517 sec)
2021-12-30 09:22:24,436 [INFO] tensorflow: epoch = 22.125, learning_rate = 0.0009999999, loss = 0.0002627346, step = 2124 (5.517 sec)
INFO:tensorflow:global_step/sec: 3.10224
2021-12-30 09:22:24,437 [INFO] tensorflow: global_step/sec: 3.10224
2021-12-30 09:22:24,437 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.757
INFO:tensorflow:global_step/sec: 3.01358
2021-12-30 09:22:27,423 [INFO] tensorflow: global_step/sec: 3.01358
INFO:tensorflow:epoch = 22.302083333333332, learning_rate = 0.0009999999, loss = 0.0002619341, step = 2141 (5.552 sec)
2021-12-30 09:22:29,988 [INFO] tensor

INFO:tensorflow:global_step/sec: 3.14969
2021-12-30 09:23:43,004 [INFO] tensorflow: global_step/sec: 3.14969
2021-12-30 09:23:45,272 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.826
INFO:tensorflow:global_step/sec: 3.10588
2021-12-30 09:23:45,902 [INFO] tensorflow: global_step/sec: 3.10588
INFO:tensorflow:epoch = 24.78125, learning_rate = 0.0009999999, loss = 0.00018793397, step = 2379 (5.437 sec)
2021-12-30 09:23:46,867 [INFO] tensorflow: epoch = 24.78125, learning_rate = 0.0009999999, loss = 0.00018793397, step = 2379 (5.437 sec)
INFO:tensorflow:global_step/sec: 3.09285
2021-12-30 09:23:48,812 [INFO] tensorflow: global_step/sec: 3.09285
INFO:tensorflow:global_step/sec: 3.06163
2021-12-30 09:23:51,752 [INFO] tensorflow: global_step/sec: 3.06163
INFO:tensorflow:epoch = 24.958333333333332, learning_rate = 0.0009999999, loss = 0.00019745229, step = 2396 (5.533 sec)
2021-12-30 09:23:52,399 [INFO] tensorflow: epoch = 24.958333333333332, learning_rate = 0.0009999999, lo

INFO:tensorflow:global_step/sec: 3.0688
2021-12-30 09:25:04,561 [INFO] tensorflow: global_step/sec: 3.0688
2021-12-30 09:25:06,153 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.801
INFO:tensorflow:global_step/sec: 3.11328
2021-12-30 09:25:07,452 [INFO] tensorflow: global_step/sec: 3.11328
INFO:tensorflow:epoch = 27.4375, learning_rate = 0.0009999999, loss = 0.00020225496, step = 2634 (5.469 sec)
2021-12-30 09:25:09,390 [INFO] tensorflow: epoch = 27.4375, learning_rate = 0.0009999999, loss = 0.00020225496, step = 2634 (5.469 sec)
INFO:tensorflow:global_step/sec: 3.08255
2021-12-30 09:25:10,371 [INFO] tensorflow: global_step/sec: 3.08255
INFO:tensorflow:global_step/sec: 3.12082
2021-12-30 09:25:13,255 [INFO] tensorflow: global_step/sec: 3.12082
2021-12-30 09:25:14,261 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.668
INFO:tensorflow:epoch = 27.614583333333332, learning_rate = 0.0009999999, loss = 0.00019464636, step = 2651 (5.512 sec)
2021-12-30 09

2021-12-30 09:26:27,351 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.512
INFO:tensorflow:Saving checkpoints for step-2880.
2021-12-30 09:26:28,954 [INFO] tensorflow: Saving checkpoints for step-2880.
2021-12-30 09:26:32,639 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 23, 0.00s/step
2021-12-30 09:26:44,331 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 23, 1.17s/step
2021-12-30 09:26:56,370 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 23, 1.20s/step
Matching predictions to ground truth, class 1/1.: 100%|█| 115313/115313 [00:07<00:00, 15500.79it/s]
Epoch 30/120

Validation cost: 0.000706
Mean average_precision (in %): 24.3751

class name average precision (in %)
------------ --------------------------
car 24.3751

Median Inference Time: 0.017064
INFO:tensorflow:epoch = 30.0, learning_rate = 0.0009999999, loss = 0.00014131374, step = 2880 (42.947 sec)
2021-12-30 09:27:09,643 [INFO] tensorflow: epoch = 30.0, learning_rate = 0.000999999

2021-12-30 09:28:20,273 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.720
INFO:tensorflow:epoch = 32.30208333333333, learning_rate = 0.0009999999, loss = 0.00028350917, step = 3101 (5.490 sec)
2021-12-30 09:28:20,931 [INFO] tensorflow: epoch = 32.30208333333333, learning_rate = 0.0009999999, loss = 0.00028350917, step = 3101 (5.490 sec)
INFO:tensorflow:global_step/sec: 3.09948
2021-12-30 09:28:22,206 [INFO] tensorflow: global_step/sec: 3.09948
INFO:tensorflow:global_step/sec: 3.02581
2021-12-30 09:28:25,180 [INFO] tensorflow: global_step/sec: 3.02581
INFO:tensorflow:epoch = 32.479166666666664, learning_rate = 0.0009999999, loss = 0.00022941935, step = 3118 (5.557 sec)
2021-12-30 09:28:26,488 [INFO] tensorflow: epoch = 32.479166666666664, learning_rate = 0.0009999999, loss = 0.00022941935, step = 3118 (5.557 sec)
INFO:tensorflow:global_step/sec: 3.06921
2021-12-30 09:28:28,113 [INFO] tensorflow: global_step/sec: 3.06921
2021-12-30 09:28:28,442 [INFO] modulus.hooks.sa

2021-12-30 09:29:41,144 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.828
INFO:tensorflow:epoch = 34.95833333333333, learning_rate = 0.0009999999, loss = 0.0002521411, step = 3356 (5.474 sec)
2021-12-30 09:29:43,392 [INFO] tensorflow: epoch = 34.95833333333333, learning_rate = 0.0009999999, loss = 0.0002521411, step = 3356 (5.474 sec)
INFO:tensorflow:global_step/sec: 3.0981
2021-12-30 09:29:43,733 [INFO] tensorflow: global_step/sec: 3.0981
2021-12-30 09:29:44,751 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 35/120: loss: 0.00023 learning rate: 0.00100 Time taken: 0:00:31.013570 ETA: 0:43:56.153436
INFO:tensorflow:global_step/sec: 3.03334
2021-12-30 09:29:46,700 [INFO] tensorflow: global_step/sec: 3.03334
INFO:tensorflow:epoch = 35.135416666666664, learning_rate = 0.0009999999, loss = 0.00026018888, step = 3373 (5.542 sec)
2021-12-30 09:29:48,934 [INFO] tensorflow: epoch = 35.135416666666664, learning_rate = 0.0009999999, loss = 0.00026018888, st

2021-12-30 09:31:02,048 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.924
INFO:tensorflow:global_step/sec: 3.12133
2021-12-30 09:31:02,357 [INFO] tensorflow: global_step/sec: 3.12133
INFO:tensorflow:global_step/sec: 3.1237
2021-12-30 09:31:05,238 [INFO] tensorflow: global_step/sec: 3.1237
INFO:tensorflow:epoch = 37.61458333333333, learning_rate = 0.0009999999, loss = 0.0002096818, step = 3611 (5.468 sec)
2021-12-30 09:31:05,890 [INFO] tensorflow: epoch = 37.61458333333333, learning_rate = 0.0009999999, loss = 0.0002096818, step = 3611 (5.468 sec)
INFO:tensorflow:global_step/sec: 3.07914
2021-12-30 09:31:08,161 [INFO] tensorflow: global_step/sec: 3.07914
2021-12-30 09:31:10,077 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.911
INFO:tensorflow:global_step/sec: 3.13718
2021-12-30 09:31:11,030 [INFO] tensorflow: global_step/sec: 3.13718
INFO:tensorflow:epoch = 37.791666666666664, learning_rate = 0.0009999999, loss = 0.00017775313, step = 3628 (5.454 

Matching predictions to ground truth, class 1/1.: 100%|█| 10649/10649 [00:00<00:00, 15841.72it/s]
Epoch 40/120

Validation cost: 0.000258
Mean average_precision (in %): 50.1098

class name average precision (in %)
------------ --------------------------
car 50.1098

Median Inference Time: 0.017892
INFO:tensorflow:epoch = 40.0, learning_rate = 0.0009999999, loss = 0.00022762202, step = 3840 (25.653 sec)
2021-12-30 09:32:43,053 [INFO] tensorflow: epoch = 40.0, learning_rate = 0.0009999999, loss = 0.00022762202, step = 3840 (25.653 sec)
2021-12-30 09:32:43,053 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 40/120: loss: 0.00023 learning rate: 0.00100 Time taken: 0:00:54.146785 ETA: 1:12:11.742764
INFO:tensorflow:global_step/sec: 0.346186
2021-12-30 09:32:44,005 [INFO] tensorflow: global_step/sec: 0.346186
2021-12-30 09:32:45,915 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 6.433
INFO:tensorflow:global_step/sec: 3.12245
2021-12-30 09:32:46,888 [INFO] te

INFO:tensorflow:global_step/sec: 3.0846
2021-12-30 09:33:59,765 [INFO] tensorflow: global_step/sec: 3.0846
INFO:tensorflow:epoch = 42.479166666666664, learning_rate = 0.0009999999, loss = 0.00016405058, step = 4078 (5.512 sec)
2021-12-30 09:34:00,089 [INFO] tensorflow: epoch = 42.479166666666664, learning_rate = 0.0009999999, loss = 0.00016405058, step = 4078 (5.512 sec)
INFO:tensorflow:global_step/sec: 3.0847
2021-12-30 09:34:02,683 [INFO] tensorflow: global_step/sec: 3.0847
INFO:tensorflow:epoch = 42.65625, learning_rate = 0.0009999999, loss = 0.00025825217, step = 4095 (5.505 sec)
2021-12-30 09:34:05,594 [INFO] tensorflow: epoch = 42.65625, learning_rate = 0.0009999999, loss = 0.00025825217, step = 4095 (5.505 sec)
INFO:tensorflow:global_step/sec: 3.09061
2021-12-30 09:34:05,595 [INFO] tensorflow: global_step/sec: 3.09061
2021-12-30 09:34:06,899 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.676
INFO:tensorflow:global_step/sec: 3.05925
2021-12-30 09:34:08,537 [INF

2021-12-30 09:35:19,729 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.773
INFO:tensorflow:global_step/sec: 3.06986
2021-12-30 09:35:21,358 [INFO] tensorflow: global_step/sec: 3.06986
INFO:tensorflow:epoch = 45.135416666666664, learning_rate = 0.0009999999, loss = 0.00019266657, step = 4333 (5.557 sec)
2021-12-30 09:35:22,710 [INFO] tensorflow: epoch = 45.135416666666664, learning_rate = 0.0009999999, loss = 0.00019266657, step = 4333 (5.557 sec)
INFO:tensorflow:global_step/sec: 3.07105
2021-12-30 09:35:24,288 [INFO] tensorflow: global_step/sec: 3.07105
INFO:tensorflow:global_step/sec: 3.09493
2021-12-30 09:35:27,196 [INFO] tensorflow: global_step/sec: 3.09493
2021-12-30 09:35:27,826 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.700
INFO:tensorflow:epoch = 45.3125, learning_rate = 0.0009999999, loss = 0.00021266742, step = 4350 (5.449 sec)
2021-12-30 09:35:28,159 [INFO] tensorflow: epoch = 45.3125, learning_rate = 0.0009999999, loss = 0.0002126674

2021-12-30 09:36:40,849 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.604
INFO:tensorflow:global_step/sec: 3.03792
2021-12-30 09:36:43,165 [INFO] tensorflow: global_step/sec: 3.03792
INFO:tensorflow:epoch = 47.791666666666664, learning_rate = 0.0009999999, loss = 0.00025669375, step = 4588 (5.536 sec)
2021-12-30 09:36:45,415 [INFO] tensorflow: epoch = 47.791666666666664, learning_rate = 0.0009999999, loss = 0.00025669375, step = 4588 (5.536 sec)
INFO:tensorflow:global_step/sec: 3.08203
2021-12-30 09:36:46,085 [INFO] tensorflow: global_step/sec: 3.08203
INFO:tensorflow:global_step/sec: 3.11085
2021-12-30 09:36:48,978 [INFO] tensorflow: global_step/sec: 3.11085
2021-12-30 09:36:48,979 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.601
INFO:tensorflow:epoch = 47.96875, learning_rate = 0.0009999999, loss = 0.00027893117, step = 4605 (5.507 sec)
2021-12-30 09:36:50,922 [INFO] tensorflow: epoch = 47.96875, learning_rate = 0.0009999999, loss = 0.00027893

INFO:tensorflow:epoch = 50.0, learning_rate = 0.0009999999, loss = 0.00016254658, step = 4800 (14.833 sec)
2021-12-30 09:38:06,198 [INFO] tensorflow: epoch = 50.0, learning_rate = 0.0009999999, loss = 0.00016254658, step = 4800 (14.833 sec)
2021-12-30 09:38:06,199 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 50/120: loss: 0.00016 learning rate: 0.00100 Time taken: 0:00:43.293750 ETA: 0:50:30.562470
INFO:tensorflow:global_step/sec: 0.594587
2021-12-30 09:38:08,130 [INFO] tensorflow: global_step/sec: 0.594587
INFO:tensorflow:global_step/sec: 3.13967
2021-12-30 09:38:10,996 [INFO] tensorflow: global_step/sec: 3.13967
INFO:tensorflow:epoch = 50.17708333333333, learning_rate = 0.0009999999, loss = 0.00022868972, step = 4817 (5.439 sec)
2021-12-30 09:38:11,637 [INFO] tensorflow: epoch = 50.17708333333333, learning_rate = 0.0009999999, loss = 0.00022868972, step = 4817 (5.439 sec)
INFO:tensorflow:global_step/sec: 3.05274
2021-12-30 09:38:13,945 [INFO] tensorflow: global_s

INFO:tensorflow:global_step/sec: 3.0492
2021-12-30 09:39:26,910 [INFO] tensorflow: global_step/sec: 3.0492
2021-12-30 09:39:26,911 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.436
INFO:tensorflow:epoch = 52.65625, learning_rate = 0.0009999999, loss = 0.00024269258, step = 5055 (5.584 sec)
2021-12-30 09:39:28,903 [INFO] tensorflow: epoch = 52.65625, learning_rate = 0.0009999999, loss = 0.00024269258, step = 5055 (5.584 sec)
INFO:tensorflow:global_step/sec: 3.03063
2021-12-30 09:39:29,880 [INFO] tensorflow: global_step/sec: 3.03063
INFO:tensorflow:global_step/sec: 3.14398
2021-12-30 09:39:32,742 [INFO] tensorflow: global_step/sec: 3.14398
INFO:tensorflow:epoch = 52.83333333333333, learning_rate = 0.0009999999, loss = 0.00019571744, step = 5072 (5.479 sec)
2021-12-30 09:39:34,382 [INFO] tensorflow: epoch = 52.83333333333333, learning_rate = 0.0009999999, loss = 0.00019571744, step = 5072 (5.479 sec)
2021-12-30 09:39:35,028 [INFO] modulus.hooks.sample_counter_hook: Tra

2021-12-30 09:40:47,999 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.767
INFO:tensorflow:global_step/sec: 3.11852
2021-12-30 09:40:48,651 [INFO] tensorflow: global_step/sec: 3.11852
INFO:tensorflow:epoch = 55.3125, learning_rate = 0.0009999999, loss = 0.00021304096, step = 5310 (5.491 sec)
2021-12-30 09:40:51,581 [INFO] tensorflow: epoch = 55.3125, learning_rate = 0.0009999999, loss = 0.00021304096, step = 5310 (5.491 sec)
INFO:tensorflow:global_step/sec: 3.07037
2021-12-30 09:40:51,582 [INFO] tensorflow: global_step/sec: 3.07037
INFO:tensorflow:global_step/sec: 3.09238
2021-12-30 09:40:54,492 [INFO] tensorflow: global_step/sec: 3.09238
2021-12-30 09:40:56,120 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.628
INFO:tensorflow:epoch = 55.48958333333333, learning_rate = 0.0009999999, loss = 0.00033611344, step = 5327 (5.504 sec)
2021-12-30 09:40:57,085 [INFO] tensorflow: epoch = 55.48958333333333, learning_rate = 0.0009999999, loss = 0.00033611344,

2021-12-30 09:42:09,211 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.324
INFO:tensorflow:global_step/sec: 3.04716
2021-12-30 09:42:10,521 [INFO] tensorflow: global_step/sec: 3.04716
INFO:tensorflow:global_step/sec: 3.09935
2021-12-30 09:42:13,425 [INFO] tensorflow: global_step/sec: 3.09935
INFO:tensorflow:epoch = 57.96875, learning_rate = 0.0009999999, loss = 0.00015433287, step = 5565 (5.501 sec)
2021-12-30 09:42:14,397 [INFO] tensorflow: epoch = 57.96875, learning_rate = 0.0009999999, loss = 0.00015433287, step = 5565 (5.501 sec)
2021-12-30 09:42:15,373 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 58/120: loss: 0.00025 learning rate: 0.00100 Time taken: 0:00:31.219094 ETA: 0:32:15.583801
INFO:tensorflow:global_step/sec: 3.08656
2021-12-30 09:42:16,341 [INFO] tensorflow: global_step/sec: 3.08656
2021-12-30 09:42:17,313 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.687
INFO:tensorflow:global_step/sec: 3.0774
2021-12-30 09:42

INFO:tensorflow:global_step/sec: 3.0484
2021-12-30 09:43:28,582 [INFO] tensorflow: global_step/sec: 3.0484
2021-12-30 09:43:30,209 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 12.432
INFO:tensorflow:epoch = 60.17708333333333, learning_rate = 0.0009999999, loss = 0.0002450766, step = 5777 (5.549 sec)
2021-12-30 09:43:31,178 [INFO] tensorflow: epoch = 60.17708333333333, learning_rate = 0.0009999999, loss = 0.0002450766, step = 5777 (5.549 sec)
INFO:tensorflow:global_step/sec: 3.08029
2021-12-30 09:43:31,504 [INFO] tensorflow: global_step/sec: 3.08029
INFO:tensorflow:global_step/sec: 3.05788
2021-12-30 09:43:34,447 [INFO] tensorflow: global_step/sec: 3.05788
INFO:tensorflow:epoch = 60.354166666666664, learning_rate = 0.0009999999, loss = 0.00025938704, step = 5794 (5.524 sec)
2021-12-30 09:43:36,702 [INFO] tensorflow: epoch = 60.354166666666664, learning_rate = 0.0009999999, loss = 0.00025938704, step = 5794 (5.524 sec)
INFO:tensorflow:global_step/sec: 3.10798
2021-12-30

INFO:tensorflow:global_step/sec: 3.07032
2021-12-30 09:44:50,248 [INFO] tensorflow: global_step/sec: 3.07032
2021-12-30 09:44:51,222 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.562
INFO:tensorflow:global_step/sec: 3.08049
2021-12-30 09:44:53,169 [INFO] tensorflow: global_step/sec: 3.08049
INFO:tensorflow:epoch = 62.83333333333333, learning_rate = 0.0009999999, loss = 0.00018319156, step = 6032 (5.513 sec)
2021-12-30 09:44:53,814 [INFO] tensorflow: epoch = 62.83333333333333, learning_rate = 0.0009999999, loss = 0.00018319156, step = 6032 (5.513 sec)
INFO:tensorflow:global_step/sec: 3.08536
2021-12-30 09:44:56,086 [INFO] tensorflow: global_step/sec: 3.08536
INFO:tensorflow:global_step/sec: 3.05624
2021-12-30 09:44:59,031 [INFO] tensorflow: global_step/sec: 3.05624
2021-12-30 09:44:59,032 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 63/120: loss: 0.00019 learning rate: 0.00100 Time taken: 0:00:31.151218 ETA: 0:29:35.619422
INFO:tensorflow:epoch =

INFO:tensorflow:global_step/sec: 3.10108
2021-12-30 09:46:11,943 [INFO] tensorflow: global_step/sec: 3.10108
2021-12-30 09:46:12,260 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.857
INFO:tensorflow:global_step/sec: 3.06072
2021-12-30 09:46:14,884 [INFO] tensorflow: global_step/sec: 3.06072
INFO:tensorflow:epoch = 65.48958333333333, learning_rate = 0.0009999999, loss = 0.00023309077, step = 6287 (5.433 sec)
2021-12-30 09:46:16,447 [INFO] tensorflow: epoch = 65.48958333333333, learning_rate = 0.0009999999, loss = 0.00023309077, step = 6287 (5.433 sec)
INFO:tensorflow:global_step/sec: 3.22219
2021-12-30 09:46:17,677 [INFO] tensorflow: global_step/sec: 3.22219
2021-12-30 09:46:20,266 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.982
INFO:tensorflow:global_step/sec: 3.08317
2021-12-30 09:46:20,596 [INFO] tensorflow: global_step/sec: 3.08317
INFO:tensorflow:epoch = 65.66666666666666, learning_rate = 0.0009999999, loss = 0.00021256598, step = 6304 (5.4

2021-12-30 09:47:34,412 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 68/120: loss: 0.00024 learning rate: 0.00100 Time taken: 0:00:31.017985 ETA: 0:26:52.935201
INFO:tensorflow:global_step/sec: 3.15991
2021-12-30 09:47:36,324 [INFO] tensorflow: global_step/sec: 3.15991
INFO:tensorflow:epoch = 68.14583333333333, learning_rate = 0.0009999999, loss = 0.0002937363, step = 6542 (5.427 sec)
2021-12-30 09:47:38,902 [INFO] tensorflow: epoch = 68.14583333333333, learning_rate = 0.0009999999, loss = 0.0002937363, step = 6542 (5.427 sec)
INFO:tensorflow:global_step/sec: 3.10262
2021-12-30 09:47:39,225 [INFO] tensorflow: global_step/sec: 3.10262
2021-12-30 09:47:41,155 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.999
INFO:tensorflow:global_step/sec: 3.11416
2021-12-30 09:47:42,115 [INFO] tensorflow: global_step/sec: 3.11416
INFO:tensorflow:epoch = 68.32291666666666, learning_rate = 0.0009999999, loss = 0.00029822148, step = 6559 (5.468 sec)
2021-12-30 09:4

2021-12-30 09:48:53,964 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.724
INFO:tensorflow:global_step/sec: 3.07393
2021-12-30 09:48:54,304 [INFO] tensorflow: global_step/sec: 3.07393
INFO:tensorflow:epoch = 70.35416666666666, learning_rate = 0.0009999999, loss = 0.00020334452, step = 6754 (5.508 sec)
2021-12-30 09:48:55,613 [INFO] tensorflow: epoch = 70.35416666666666, learning_rate = 0.0009999999, loss = 0.00020334452, step = 6754 (5.508 sec)
INFO:tensorflow:global_step/sec: 3.07583
2021-12-30 09:48:57,230 [INFO] tensorflow: global_step/sec: 3.07583
INFO:tensorflow:global_step/sec: 3.0666
2021-12-30 09:49:00,165 [INFO] tensorflow: global_step/sec: 3.0666
INFO:tensorflow:epoch = 70.53125, learning_rate = 0.0009999999, loss = 0.00025820165, step = 6771 (5.521 sec)
2021-12-30 09:49:01,133 [INFO] tensorflow: epoch = 70.53125, learning_rate = 0.0009999999, loss = 0.00025820165, step = 6771 (5.521 sec)
2021-12-30 09:49:02,086 [INFO] modulus.hooks.sample_counter_hook: Tra

2021-12-30 09:50:15,043 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.473
INFO:tensorflow:global_step/sec: 3.07268
2021-12-30 09:50:16,019 [INFO] tensorflow: global_step/sec: 3.07268
2021-12-30 09:50:17,952 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 73/120: loss: 0.00018 learning rate: 0.00100 Time taken: 0:00:31.116398 ETA: 0:24:22.470711
INFO:tensorflow:epoch = 73.01041666666666, learning_rate = 0.0009999999, loss = 0.00023321885, step = 7009 (5.506 sec)
2021-12-30 09:50:18,277 [INFO] tensorflow: epoch = 73.01041666666666, learning_rate = 0.0009999999, loss = 0.00023321885, step = 7009 (5.506 sec)
INFO:tensorflow:global_step/sec: 3.1127
2021-12-30 09:50:18,910 [INFO] tensorflow: global_step/sec: 3.1127
INFO:tensorflow:global_step/sec: 3.03598
2021-12-30 09:50:21,875 [INFO] tensorflow: global_step/sec: 3.03598
2021-12-30 09:50:23,149 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.675
INFO:tensorflow:epoch = 73.1875, learnin

2021-12-30 09:51:36,349 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.400
INFO:tensorflow:global_step/sec: 3.02033
2021-12-30 09:51:37,985 [INFO] tensorflow: global_step/sec: 3.02033
INFO:tensorflow:global_step/sec: 3.1175
2021-12-30 09:51:40,872 [INFO] tensorflow: global_step/sec: 3.1175
INFO:tensorflow:epoch = 75.66666666666666, learning_rate = 0.0009999999, loss = 0.00023226612, step = 7264 (5.548 sec)
2021-12-30 09:51:41,206 [INFO] tensorflow: epoch = 75.66666666666666, learning_rate = 0.0009999999, loss = 0.00023226612, step = 7264 (5.548 sec)
INFO:tensorflow:global_step/sec: 3.07759
2021-12-30 09:51:43,797 [INFO] tensorflow: global_step/sec: 3.07759
2021-12-30 09:51:44,441 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.716
INFO:tensorflow:epoch = 75.84375, learning_rate = 0.0009999999, loss = 0.00020319423, step = 7281 (5.482 sec)
2021-12-30 09:51:46,688 [INFO] tensorflow: epoch = 75.84375, learning_rate = 0.0009999999, loss = 0.00020319423,

INFO:tensorflow:epoch = 78.14583333333333, learning_rate = 0.0009999999, loss = 0.00023187764, step = 7502 (5.534 sec)
2021-12-30 09:52:58,664 [INFO] tensorflow: epoch = 78.14583333333333, learning_rate = 0.0009999999, loss = 0.00023187764, step = 7502 (5.534 sec)
INFO:tensorflow:global_step/sec: 2.98027
2021-12-30 09:52:59,978 [INFO] tensorflow: global_step/sec: 2.98027
INFO:tensorflow:global_step/sec: 3.06928
2021-12-30 09:53:02,910 [INFO] tensorflow: global_step/sec: 3.06928
INFO:tensorflow:epoch = 78.32291666666666, learning_rate = 0.0009999999, loss = 0.00016897186, step = 7519 (5.515 sec)
2021-12-30 09:53:04,179 [INFO] tensorflow: epoch = 78.32291666666666, learning_rate = 0.0009999999, loss = 0.00016897186, step = 7519 (5.515 sec)
INFO:tensorflow:global_step/sec: 3.14681
2021-12-30 09:53:05,770 [INFO] tensorflow: global_step/sec: 3.14681
2021-12-30 09:53:05,771 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.476
INFO:tensorflow:global_step/sec: 3.09638
2021-12-

INFO:tensorflow:global_step/sec: 3.06789
2021-12-30 09:54:17,485 [INFO] tensorflow: global_step/sec: 3.06789
2021-12-30 09:54:18,117 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.602
INFO:tensorflow:epoch = 80.53125, learning_rate = 0.0009999999, loss = 0.00026451604, step = 7731 (5.505 sec)
2021-12-30 09:54:20,394 [INFO] tensorflow: epoch = 80.53125, learning_rate = 0.0009999999, loss = 0.00026451604, step = 7731 (5.505 sec)
INFO:tensorflow:global_step/sec: 3.09269
2021-12-30 09:54:20,395 [INFO] tensorflow: global_step/sec: 3.09269
INFO:tensorflow:global_step/sec: 3.0809
2021-12-30 09:54:23,316 [INFO] tensorflow: global_step/sec: 3.0809
INFO:tensorflow:epoch = 80.70833333333333, learning_rate = 0.0009999999, loss = 0.00019114255, step = 7748 (5.551 sec)
2021-12-30 09:54:25,945 [INFO] tensorflow: epoch = 80.70833333333333, learning_rate = 0.0009999999, loss = 0.00019114255, step = 7748 (5.551 sec)
INFO:tensorflow:global_step/sec: 3.03202
2021-12-30 09:54:26,284 [INF

INFO:tensorflow:global_step/sec: 3.08252
2021-12-30 09:55:39,312 [INFO] tensorflow: global_step/sec: 3.08252
2021-12-30 09:55:39,313 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.878
INFO:tensorflow:global_step/sec: 3.09959
2021-12-30 09:55:42,216 [INFO] tensorflow: global_step/sec: 3.09959
INFO:tensorflow:epoch = 83.1875, learning_rate = 0.0009999999, loss = 0.0002347774, step = 7986 (5.461 sec)
2021-12-30 09:55:43,174 [INFO] tensorflow: epoch = 83.1875, learning_rate = 0.0009999999, loss = 0.0002347774, step = 7986 (5.461 sec)
INFO:tensorflow:global_step/sec: 3.08418
2021-12-30 09:55:45,134 [INFO] tensorflow: global_step/sec: 3.08418
2021-12-30 09:55:47,359 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.857
INFO:tensorflow:global_step/sec: 3.13967
2021-12-30 09:55:48,001 [INFO] tensorflow: global_step/sec: 3.13967
INFO:tensorflow:epoch = 83.36458333333333, learning_rate = 0.0009999999, loss = 0.00018955772, step = 8003 (5.522 sec)
2021-12-30 09:

INFO:tensorflow:global_step/sec: 3.0412
2021-12-30 09:57:01,014 [INFO] tensorflow: global_step/sec: 3.0412
INFO:tensorflow:global_step/sec: 3.02663
2021-12-30 09:57:03,988 [INFO] tensorflow: global_step/sec: 3.02663
INFO:tensorflow:epoch = 85.84375, learning_rate = 0.000762346, loss = 0.00021473697, step = 8241 (5.564 sec)
2021-12-30 09:57:05,931 [INFO] tensorflow: epoch = 85.84375, learning_rate = 0.000762346, loss = 0.00021473697, step = 8241 (5.564 sec)
INFO:tensorflow:global_step/sec: 3.10655
2021-12-30 09:57:06,885 [INFO] tensorflow: global_step/sec: 3.10655
2021-12-30 09:57:08,526 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.512
INFO:tensorflow:global_step/sec: 3.03784
2021-12-30 09:57:09,848 [INFO] tensorflow: global_step/sec: 3.03784
2021-12-30 09:57:10,849 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 86/120: loss: 0.00021 learning rate: 0.00075 Time taken: 0:00:31.251153 ETA: 0:17:42.539194
INFO:tensorflow:epoch = 86.02083333333333, le

INFO:tensorflow:epoch = 88.32291666666666, learning_rate = 0.0005292853, loss = 0.00020249978, step = 8479 (5.511 sec)
2021-12-30 09:58:23,068 [INFO] tensorflow: epoch = 88.32291666666666, learning_rate = 0.0005292853, loss = 0.00020249978, step = 8479 (5.511 sec)
INFO:tensorflow:global_step/sec: 3.10127
2021-12-30 09:58:25,648 [INFO] tensorflow: global_step/sec: 3.10127
INFO:tensorflow:epoch = 88.5, learning_rate = 0.000515669, loss = 0.00019063393, step = 8496 (5.498 sec)
2021-12-30 09:58:28,566 [INFO] tensorflow: epoch = 88.5, learning_rate = 0.000515669, loss = 0.00019063393, step = 8496 (5.498 sec)
INFO:tensorflow:global_step/sec: 3.08334
2021-12-30 09:58:28,567 [INFO] tensorflow: global_step/sec: 3.08334
2021-12-30 09:58:29,550 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.701
INFO:tensorflow:global_step/sec: 3.06651
2021-12-30 09:58:31,502 [INFO] tensorflow: global_step/sec: 3.06651
INFO:tensorflow:epoch = 88.67708333333333, learning_rate = 0.000502403, loss 

2021-12-30 09:59:41,798 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.700
INFO:tensorflow:global_step/sec: 3.11008
2021-12-30 09:59:43,076 [INFO] tensorflow: global_step/sec: 3.11008
INFO:tensorflow:epoch = 90.70833333333333, learning_rate = 0.00037258043, loss = 0.00018843211, step = 8708 (5.503 sec)
2021-12-30 09:59:44,679 [INFO] tensorflow: epoch = 90.70833333333333, learning_rate = 0.00037258043, loss = 0.00018843211, step = 8708 (5.503 sec)
INFO:tensorflow:global_step/sec: 3.16497
2021-12-30 09:59:45,919 [INFO] tensorflow: global_step/sec: 3.16497
INFO:tensorflow:global_step/sec: 3.06177
2021-12-30 09:59:48,859 [INFO] tensorflow: global_step/sec: 3.06177
2021-12-30 09:59:49,825 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.917
INFO:tensorflow:epoch = 90.88541666666666, learning_rate = 0.0003629951, loss = 0.00016345167, step = 8725 (5.473 sec)
2021-12-30 09:59:50,152 [INFO] tensorflow: epoch = 90.88541666666666, learning_rate = 0.0003629951,

2021-12-30 10:01:02,830 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.500
INFO:tensorflow:global_step/sec: 3.12081
2021-12-30 10:01:04,720 [INFO] tensorflow: global_step/sec: 3.12081
INFO:tensorflow:epoch = 93.36458333333333, learning_rate = 0.00025202238, loss = 0.00012836553, step = 8963 (5.496 sec)
2021-12-30 10:01:07,331 [INFO] tensorflow: epoch = 93.36458333333333, learning_rate = 0.00025202238, loss = 0.00012836553, step = 8963 (5.496 sec)
INFO:tensorflow:global_step/sec: 3.0597
2021-12-30 10:01:07,662 [INFO] tensorflow: global_step/sec: 3.0597
INFO:tensorflow:global_step/sec: 3.04417
2021-12-30 10:01:10,618 [INFO] tensorflow: global_step/sec: 3.04417
2021-12-30 10:01:10,940 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.662
INFO:tensorflow:epoch = 93.54166666666666, learning_rate = 0.0002455388, loss = 0.00014901487, step = 8980 (5.526 sec)
2021-12-30 10:01:12,857 [INFO] tensorflow: epoch = 93.54166666666666, learning_rate = 0.0002455388, l

INFO:tensorflow:global_step/sec: 3.05883
2021-12-30 10:02:26,373 [INFO] tensorflow: global_step/sec: 3.05883
INFO:tensorflow:global_step/sec: 3.0781
2021-12-30 10:02:29,296 [INFO] tensorflow: global_step/sec: 3.0781
2021-12-30 10:02:29,297 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 96/120: loss: 0.00017 learning rate: 0.00017 Time taken: 0:00:31.151004 ETA: 0:12:27.624092
INFO:tensorflow:epoch = 96.02083333333333, learning_rate = 0.00017047403, loss = 0.00014952442, step = 9218 (5.539 sec)
2021-12-30 10:02:29,939 [INFO] tensorflow: epoch = 96.02083333333333, learning_rate = 0.00017047403, loss = 0.00014952442, step = 9218 (5.539 sec)
2021-12-30 10:02:31,855 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.694
INFO:tensorflow:global_step/sec: 3.14689
2021-12-30 10:02:32,156 [INFO] tensorflow: global_step/sec: 3.14689


In [23]:
local = $LOCAL_EXPERIMENT_DIR
local

SyntaxError: invalid syntax (1907354398.py, line 1)

In [12]:
print('Model for each epoch:')
print('---------------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

Model for each epoch:
---------------------
total 45M
-rw-r--r-- 1 guest guest 45M Dec 30 18:15 resnet18_detector.tlt


## 5. Evaluate the trained model 

In [30]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt\
 -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
 -k tlt_encode

2022-01-06 16:11:01,505 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-lv0msbns because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.

2022-01-06 08:11:07,243 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/specs/detectnet_v2_train_resnet18_kitti.txt




















2022-01-06 08:11:09,388 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
2022-01-06 08:11:09,491 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-01-06 08:11:09,491 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding ena

2022-01-06 08:11:09,779 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2022-01-06 08:11:09,784 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2022-01-06 08:11:09,784 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000


2022-01-06 08:11:10,006 [INFO] iva.detectnet_v2.evaluation.build_evaluator: Found 190 samples in validation set












__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to 
input_1 (InputLayer) (None, 3, 544, 960) 0 
__________________________________________________________________________________________________
input_1_qdq (QDQ) (None, 3, 544, 960) 1 input_1[0][0] 
__________________________________________________________________________________________________
conv1 (QuantizedConv2D) (None, 64, 272, 480) 9472 input_1_qdq[0

INFO:tensorflow:Graph was finalized.
2022-01-06 08:11:10,959 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2022-01-06 08:11:11,583 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2022-01-06 08:11:11,821 [INFO] tensorflow: Done running local_init_op.
2022-01-06 08:11:12,410 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 24, 0.00s/step
2022-01-06 08:11:18,287 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 24, 0.59s/step
2022-01-06 08:11:19,846 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 24, 0.16s/step
Matching predictions to ground truth, class 1/1.: 100%|█| 990/990 [00:00<00:00, 15370.58it/s]





Validation cost: 0.001124
Mean average_precision (in %): 92.5777

class name average precision (in %)
------------ --------------------------
car 92.5777

Median Inference Time: 0.015083
2022-01-06 08:11:20,568 [INFO] __main__: Evaluation complete.
Time taken to run __main__:main: 0:00:13.326

## 6. Prune the trained model 
* Specify pre-trained model
* Equalization criterion (`Applicable for resnets and mobilenets`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

*Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is dependent on the dataset. A pth value `5.2e-6` is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.*

*For some internal studies, we have noticed that a pth value of 0.01 is a good starting point for detectnet_v2 models.*

In [12]:
# Create an output directory if it doesn't exist.
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

In [31]:
!tao detectnet_v2 prune \
 -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
 -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.tlt \
 -eq union \
 -pth 0.05 \
 -k $KEY

2022-01-06 16:11:38,259 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-gs4_jgeb because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.
2022-01-06 08:11:45,993 [INFO] modulus.pruning.pruning: Exploring graph for retainable indices
2022-01-06 08:11:46,570 [INFO] modulus.pruning.pruning: Pruning model and appending pruned nodes to new graph


2022-01-06 08:12:03,667 [INFO] iva.common.magnet_prune: Pruning ratio (pruned model / original model): 0.11881252491690038
2022-01-06 16:12:06,552 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


In [32]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

total 11560
-rw-r--r-- 1 guest guest 5987320 Dec 31 09:27 resnet18_nopool_bn_detectnet_v2_pruned_qat.tlt
-rw-r--r-- 1 guest guest 5847776 Jan 6 16:12 resnet18_nopool_bn_detectnet_v2_pruned.tlt


## 7. Retrain the pruned model 
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification with pretrained weights as pruned model.

*Note: For retraining, please set the `load_graph` option to `true` in the model_config to load the pruned model graph. Also, if after retraining, the model shows some decrease in mAP, it could be that the originally trained model was pruned a little too much. Please try reducing the pruning threshold (thereby reducing the pruning ratio) and use the new model to retrain.*

*Note: DetectNet_v2 now supports Quantization Aware Training, to help with optmizing the model. By default, the training in the cell below doesn't run the model with QAT enabled. For information on training a model with QAT, please refer to the cells under [section 11](#head-11)*

In [37]:
# Printing the retrain experiment file. 
# Note: We have updated the experiment file to include the 
# newly pruned model as a pretrained weights and, the
# load_graph option is set to true 
!cat $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt

random_seed: 42
dataset_config {
 data_sources {
 tfrecords_path: "/workspace/tao-experiments/car_data/tfrecords/kitti_trainval/*"
 image_directory_path: "/workspace/tao-experiments/car_data/training/"
 }
 image_extension: "png"
 target_class_mapping{
 key:"car"
 value:"car"
 }
 validation_fold: 0
}
augmentation_config {
 preprocessing {
 output_image_width: 960
 output_image_height: 544
 min_bbox_width: 1.0
 min_bbox_height: 1.0
 output_image_channel: 3
 enable_auto_resize: true
 }
 spatial_augmentation {
 hflip_probability: 0.5
 vflip_probability: 0.0
 zoom_min: 1.0
 zoom_max: 1.0
 translate_max_x: 8.0
 translate_max_y: 8.0
 }
 color_augmentation {
 hue_rotation_max: 25.0
 saturation_shift_max: 0.20000000298
 contrast_scale_max: 0.10000000149
 contrast_center: 0.5
 }
}

postprocessing_config {
 target_class_config {
 key: "car"
 value {
 clustering_config {
 clustering_algorithm: DBSCAN
 coverage_threshold: 0.005
 dbscan_eps: 0.15
 dbscan

In [19]:
# Retraining using the pruned model as pretrained weights 
!tao detectnet_v2 train -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
 -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
 -k $KEY \
 -n resnet18_detector_pruned \
 --gpus $NUM_GPUS

2021-12-31 09:41:01,171 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-4fv0ff_k because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.








2021-12-31 01:41:07,169 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/detectnet_v2_car/specs/detectnet_v2_retrain_resnet18_kitti_car.txt.
2021-12-31 01:41:07,171 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/detectnet_v2_car/specs/detectnet_v2_retrain_resnet18_kitti_car.txt
2021-12-31 01:41:07,285 [INFO] __main__: Cannot iterate over exactly 761 samples with a batch size of 8; each epoch will therefore take one extra step.


















2021-12-31

__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to 
input_1 (InputLayer) (None, 3, 544, 960) 0 
__________________________________________________________________________________________________
input_1_qdq (QDQ) (None, 3, 544, 960) 1 input_1[0][0] 
__________________________________________________________________________________________________
conv1 (QuantizedConv2D) (None, 64, 272, 480) 9472 input_1_qdq[0][0] 
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0] 
__________________________________________________________________________________________________
activation_1 (ReLU) (None, 64, 272, 480) 0 bn_conv1[0][0] 
__________________________________________________________________________________________________
activation_1_qdq (QDQ) (None, 64, 272, 480) 1 activ



2021-12-31 01:41:38,728 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-12-31 01:41:38,938 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2021-12-31 01:41:38,943 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2021-12-31 01:41:38,943 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000


2021-12-31 01:41:39,262 [INFO] __main__: Found 761 samples in training set














2021-12-31 01:41:41,539 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2021-12-31 01:41:41,539 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2021-12-31 01:41:41,539 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2021-12-31 01:41:41,540 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: 4
2021-12-31 01:41:41,540 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 190, number of sources: 1, batch size per gpu: 8, steps: 24
2021-12-31 01:41:41,565 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-12-31 01:41:41,769 [INFO] modulus.blocks.data_loaders.multi

INFO:tensorflow:Graph was finalized.
2021-12-31 01:41:45,092 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2021-12-31 01:41:46,900 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-12-31 01:41:47,428 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2021-12-31 01:41:55,602 [INFO] tensorflow: Saving checkpoints for step-0.
INFO:tensorflow:epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.058969934, step = 0
2021-12-31 01:42:30,675 [INFO] tensorflow: epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.058969934, step = 0
2021-12-31 01:42:30,677 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 0/120: loss: 0.05897 learning rate: 0.00000 Time taken: 0:00:00 ETA: 0:00:00
2021-12-31 01:42:30,678 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 1.083
INFO:tensorflow:global_step/sec: 2.04701
2021-12-31 01:42:35,073 [INFO] tensorflow: global_st

INFO:tensorflow:epoch = 1.3541666666666665, learning_rate = 9.091485e-06, loss = 0.0023931498, step = 130 (5.468 sec)
2021-12-31 01:43:14,311 [INFO] tensorflow: epoch = 1.3541666666666665, learning_rate = 9.091485e-06, loss = 0.0023931498, step = 130 (5.468 sec)
INFO:tensorflow:global_step/sec: 3.06561
2021-12-31 01:43:15,959 [INFO] tensorflow: global_step/sec: 3.06561
INFO:tensorflow:global_step/sec: 3.13365
2021-12-31 01:43:18,831 [INFO] tensorflow: global_step/sec: 3.13365
INFO:tensorflow:epoch = 1.53125, learning_rate = 9.830848e-06, loss = 0.0024762782, step = 147 (5.477 sec)
2021-12-31 01:43:19,788 [INFO] tensorflow: epoch = 1.53125, learning_rate = 9.830848e-06, loss = 0.0024762782, step = 147 (5.477 sec)
2021-12-31 01:43:20,429 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.968
INFO:tensorflow:global_step/sec: 3.09446
2021-12-31 01:43:21,739 [INFO] tensorflow: global_step/sec: 3.09446
INFO:tensorflow:global_step/sec: 3.20456
2021-12-31 01:43:24,548 [INFO] ten

INFO:tensorflow:epoch = 4.010416666666666, learning_rate = 2.937497e-05, loss = 0.0016864456, step = 385 (5.407 sec)
2021-12-31 01:44:36,019 [INFO] tensorflow: epoch = 4.010416666666666, learning_rate = 2.937497e-05, loss = 0.0016864456, step = 385 (5.407 sec)
INFO:tensorflow:global_step/sec: 3.08037
2021-12-31 01:44:36,689 [INFO] tensorflow: global_step/sec: 3.08037
INFO:tensorflow:global_step/sec: 3.13591
2021-12-31 01:44:39,559 [INFO] tensorflow: global_step/sec: 3.13591
2021-12-31 01:44:40,522 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.886
INFO:tensorflow:epoch = 4.1875, learning_rate = 3.1763888e-05, loss = 0.000793205, step = 402 (5.454 sec)
2021-12-31 01:44:41,473 [INFO] tensorflow: epoch = 4.1875, learning_rate = 3.1763888e-05, loss = 0.000793205, step = 402 (5.454 sec)
INFO:tensorflow:global_step/sec: 3.10295
2021-12-31 01:44:42,459 [INFO] tensorflow: global_step/sec: 3.10295
INFO:tensorflow:global_step/sec: 3.14244
2021-12-31 01:44:45,323 [INFO] tensorf

INFO:tensorflow:epoch = 6.666666666666666, learning_rate = 9.491178e-05, loss = 0.00081462215, step = 640 (5.447 sec)
2021-12-31 01:45:58,198 [INFO] tensorflow: epoch = 6.666666666666666, learning_rate = 9.491178e-05, loss = 0.00081462215, step = 640 (5.447 sec)
INFO:tensorflow:global_step/sec: 3.12179
2021-12-31 01:46:00,769 [INFO] tensorflow: global_step/sec: 3.12179
2021-12-31 01:46:01,081 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.980
INFO:tensorflow:epoch = 6.84375, learning_rate = 0.00010263046, loss = 0.0003845542, step = 657 (5.450 sec)
2021-12-31 01:46:03,649 [INFO] tensorflow: epoch = 6.84375, learning_rate = 0.00010263046, loss = 0.0003845542, step = 657 (5.450 sec)
INFO:tensorflow:global_step/sec: 3.12453
2021-12-31 01:46:03,649 [INFO] tensorflow: global_step/sec: 3.12453
INFO:tensorflow:global_step/sec: 3.12105
2021-12-31 01:46:06,533 [INFO] tensorflow: global_step/sec: 3.12105
2021-12-31 01:46:08,511 [INFO] iva.detectnet_v2.tfhooks.task_progress_mon

INFO:tensorflow:global_step/sec: 3.02985
2021-12-31 01:47:18,951 [INFO] tensorflow: global_step/sec: 3.02985
INFO:tensorflow:epoch = 9.322916666666666, learning_rate = 0.000306664, loss = 0.00035389428, step = 895 (5.524 sec)
2021-12-31 01:47:20,231 [INFO] tensorflow: epoch = 9.322916666666666, learning_rate = 0.000306664, loss = 0.00035389428, step = 895 (5.524 sec)
2021-12-31 01:47:21,532 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.698
INFO:tensorflow:global_step/sec: 3.09731
2021-12-31 01:47:21,857 [INFO] tensorflow: global_step/sec: 3.09731
INFO:tensorflow:global_step/sec: 3.13015
2021-12-31 01:47:24,732 [INFO] tensorflow: global_step/sec: 3.13015
INFO:tensorflow:epoch = 9.5, learning_rate = 0.00033160305, loss = 0.0005043094, step = 912 (5.456 sec)
2021-12-31 01:47:25,687 [INFO] tensorflow: epoch = 9.5, learning_rate = 0.00033160305, loss = 0.0005043094, step = 912 (5.456 sec)
INFO:tensorflow:global_step/sec: 3.07979
2021-12-31 01:47:27,654 [INFO] tensorflow:

INFO:tensorflow:global_step/sec: 3.13869
2021-12-31 01:48:40,744 [INFO] tensorflow: global_step/sec: 3.13869
INFO:tensorflow:global_step/sec: 3.03852
2021-12-31 01:48:43,706 [INFO] tensorflow: global_step/sec: 3.03852
INFO:tensorflow:epoch = 11.947916666666666, learning_rate = 0.0009772663, loss = 0.00030768485, step = 1147 (5.487 sec)
2021-12-31 01:48:44,959 [INFO] tensorflow: epoch = 11.947916666666666, learning_rate = 0.0009772663, loss = 0.00030768485, step = 1147 (5.487 sec)
2021-12-31 01:48:45,612 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.859
INFO:tensorflow:global_step/sec: 3.158
2021-12-31 01:48:46,556 [INFO] tensorflow: global_step/sec: 3.158
2021-12-31 01:48:46,557 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 12/120: loss: 0.00033 learning rate: 0.00100 Time taken: 0:00:30.774258 ETA: 0:55:23.619853
INFO:tensorflow:global_step/sec: 3.07127
2021-12-31 01:48:49,486 [INFO] tensorflow: global_step/sec: 3.07127
INFO:tensorflow:epoch = 1

INFO:tensorflow:global_step/sec: 3.1256
2021-12-31 01:50:02,110 [INFO] tensorflow: global_step/sec: 3.1256
INFO:tensorflow:global_step/sec: 3.21525
2021-12-31 01:50:04,909 [INFO] tensorflow: global_step/sec: 3.21525
2021-12-31 01:50:06,227 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.084
INFO:tensorflow:epoch = 14.604166666666666, learning_rate = 0.0009999999, loss = 0.00023231363, step = 1402 (5.408 sec)
2021-12-31 01:50:07,187 [INFO] tensorflow: epoch = 14.604166666666666, learning_rate = 0.0009999999, loss = 0.00023231363, step = 1402 (5.408 sec)
INFO:tensorflow:global_step/sec: 3.09202
2021-12-31 01:50:07,820 [INFO] tensorflow: global_step/sec: 3.09202
INFO:tensorflow:global_step/sec: 3.12481
2021-12-31 01:50:10,700 [INFO] tensorflow: global_step/sec: 3.12481
INFO:tensorflow:epoch = 14.78125, learning_rate = 0.0009999999, loss = 0.000219284, step = 1419 (5.426 sec)
2021-12-31 01:50:12,613 [INFO] tensorflow: epoch = 14.78125, learning_rate = 0.0009999999, loss =

INFO:tensorflow:epoch = 17.083333333333332, learning_rate = 0.0009999999, loss = 0.00022645936, step = 1640 (5.473 sec)
2021-12-31 01:51:23,666 [INFO] tensorflow: epoch = 17.083333333333332, learning_rate = 0.0009999999, loss = 0.00022645936, step = 1640 (5.473 sec)
INFO:tensorflow:global_step/sec: 3.18882
2021-12-31 01:51:25,867 [INFO] tensorflow: global_step/sec: 3.18882
2021-12-31 01:51:26,536 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.947
INFO:tensorflow:global_step/sec: 3.09031
2021-12-31 01:51:28,780 [INFO] tensorflow: global_step/sec: 3.09031
INFO:tensorflow:epoch = 17.260416666666664, learning_rate = 0.0009999999, loss = 0.00026644807, step = 1657 (5.438 sec)
2021-12-31 01:51:29,105 [INFO] tensorflow: epoch = 17.260416666666664, learning_rate = 0.0009999999, loss = 0.00026644807, step = 1657 (5.438 sec)
INFO:tensorflow:global_step/sec: 3.07043
2021-12-31 01:51:31,711 [INFO] tensorflow: global_step/sec: 3.07043
INFO:tensorflow:epoch = 17.4375, learning_rat

INFO:tensorflow:epoch = 19.739583333333332, learning_rate = 0.0009999999, loss = 0.0002804278, step = 1895 (5.488 sec)
2021-12-31 01:52:45,769 [INFO] tensorflow: epoch = 19.739583333333332, learning_rate = 0.0009999999, loss = 0.0002804278, step = 1895 (5.488 sec)
INFO:tensorflow:global_step/sec: 3.11145
2021-12-31 01:52:47,075 [INFO] tensorflow: global_step/sec: 3.11145
2021-12-31 01:52:47,076 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.764
INFO:tensorflow:global_step/sec: 3.16168
2021-12-31 01:52:49,922 [INFO] tensorflow: global_step/sec: 3.16168
INFO:tensorflow:epoch = 19.916666666666664, learning_rate = 0.0009999999, loss = 0.00027419688, step = 1912 (5.406 sec)
2021-12-31 01:52:51,175 [INFO] tensorflow: epoch = 19.916666666666664, learning_rate = 0.0009999999, loss = 0.00027419688, step = 1912 (5.406 sec)
INFO:tensorflow:global_step/sec: 3.09613
2021-12-31 01:52:52,829 [INFO] tensorflow: global_step/sec: 3.09613
INFO:tensorflow:Saving checkpoints for step-192

2021-12-31 01:55:01,924 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 22/120: loss: 0.00028 learning rate: 0.00100 Time taken: 0:00:30.849705 ETA: 0:50:23.271111
INFO:tensorflow:global_step/sec: 3.12439
2021-12-31 01:55:02,883 [INFO] tensorflow: global_step/sec: 3.12439
INFO:tensorflow:epoch = 22.125, learning_rate = 0.0009999999, loss = 0.0002457933, step = 2124 (5.397 sec)
2021-12-31 01:55:05,710 [INFO] tensorflow: epoch = 22.125, learning_rate = 0.0009999999, loss = 0.0002457933, step = 2124 (5.397 sec)
INFO:tensorflow:global_step/sec: 3.18275
2021-12-31 01:55:05,711 [INFO] tensorflow: global_step/sec: 3.18275
2021-12-31 01:55:05,711 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.229
INFO:tensorflow:global_step/sec: 3.10927
2021-12-31 01:55:08,605 [INFO] tensorflow: global_step/sec: 3.10927
INFO:tensorflow:epoch = 22.302083333333332, learning_rate = 0.0009999999, loss = 0.00023270797, step = 2141 (5.448 sec)
2021-12-31 01:55:11,157 [INFO] tenso

INFO:tensorflow:global_step/sec: 3.10089
2021-12-31 01:56:23,874 [INFO] tensorflow: global_step/sec: 3.10089
2021-12-31 01:56:26,101 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.029
INFO:tensorflow:global_step/sec: 3.14177
2021-12-31 01:56:26,739 [INFO] tensorflow: global_step/sec: 3.14177
INFO:tensorflow:epoch = 24.78125, learning_rate = 0.0009999999, loss = 0.0001761469, step = 2379 (5.448 sec)
2021-12-31 01:56:27,698 [INFO] tensorflow: epoch = 24.78125, learning_rate = 0.0009999999, loss = 0.0001761469, step = 2379 (5.448 sec)
INFO:tensorflow:global_step/sec: 3.13058
2021-12-31 01:56:29,614 [INFO] tensorflow: global_step/sec: 3.13058
INFO:tensorflow:global_step/sec: 3.15482
2021-12-31 01:56:32,467 [INFO] tensorflow: global_step/sec: 3.15482
INFO:tensorflow:epoch = 24.958333333333332, learning_rate = 0.0009999999, loss = 0.00020437592, step = 2396 (5.394 sec)
2021-12-31 01:56:33,092 [INFO] tensorflow: epoch = 24.958333333333332, learning_rate = 0.0009999999, loss

INFO:tensorflow:global_step/sec: 3.10632
2021-12-31 01:57:44,658 [INFO] tensorflow: global_step/sec: 3.10632
2021-12-31 01:57:46,214 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.067
INFO:tensorflow:global_step/sec: 3.14501
2021-12-31 01:57:47,520 [INFO] tensorflow: global_step/sec: 3.14501
INFO:tensorflow:epoch = 27.4375, learning_rate = 0.0009999999, loss = 0.00019256785, step = 2634 (5.427 sec)
2021-12-31 01:57:49,441 [INFO] tensorflow: epoch = 27.4375, learning_rate = 0.0009999999, loss = 0.00019256785, step = 2634 (5.427 sec)
INFO:tensorflow:global_step/sec: 3.12231
2021-12-31 01:57:50,402 [INFO] tensorflow: global_step/sec: 3.12231
INFO:tensorflow:global_step/sec: 3.07739
2021-12-31 01:57:53,327 [INFO] tensorflow: global_step/sec: 3.07739
2021-12-31 01:57:54,289 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.768
INFO:tensorflow:epoch = 27.614583333333332, learning_rate = 0.0009999999, loss = 0.00020603626, step = 2651 (5.490 sec)
2021-12-31 

2021-12-31 01:59:06,301 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.242
INFO:tensorflow:Saving checkpoints for step-2880.
2021-12-31 01:59:07,896 [INFO] tensorflow: Saving checkpoints for step-2880.
2021-12-31 01:59:11,613 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 23, 0.00s/step
2021-12-31 01:59:29,611 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 23, 1.80s/step
2021-12-31 01:59:47,705 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 23, 1.81s/step
Matching predictions to ground truth, class 1/1.: 100%|█| 227520/227520 [00:15<00:00, 15150.37it/s]
Epoch 30/120

Validation cost: 0.000272
Mean average_precision (in %): 36.6376

class name average precision (in %)
------------ --------------------------
car 36.6376

Median Inference Time: 0.018155
INFO:tensorflow:epoch = 30.0, learning_rate = 0.0009999999, loss = 0.00014572589, step = 2880 (67.029 sec)
2021-12-31 02:00:12,680 [INFO] tensorflow: epoch = 30.0, learning_rate = 0.000999999

2021-12-31 02:01:22,954 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.969
INFO:tensorflow:epoch = 32.30208333333333, learning_rate = 0.0009999999, loss = 0.00028179493, step = 3101 (5.441 sec)
2021-12-31 02:01:23,608 [INFO] tensorflow: epoch = 32.30208333333333, learning_rate = 0.0009999999, loss = 0.00028179493, step = 3101 (5.441 sec)
INFO:tensorflow:global_step/sec: 3.07548
2021-12-31 02:01:24,911 [INFO] tensorflow: global_step/sec: 3.07548
INFO:tensorflow:global_step/sec: 3.13044
2021-12-31 02:01:27,786 [INFO] tensorflow: global_step/sec: 3.13044
INFO:tensorflow:epoch = 32.479166666666664, learning_rate = 0.0009999999, loss = 0.0002054078, step = 3118 (5.415 sec)
2021-12-31 02:01:29,023 [INFO] tensorflow: epoch = 32.479166666666664, learning_rate = 0.0009999999, loss = 0.0002054078, step = 3118 (5.415 sec)
INFO:tensorflow:global_step/sec: 3.18964
2021-12-31 02:01:30,608 [INFO] tensorflow: global_step/sec: 3.18964
2021-12-31 02:01:30,933 [INFO] modulus.hooks.samp

2021-12-31 02:02:43,149 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.983
INFO:tensorflow:epoch = 34.95833333333333, learning_rate = 0.0009999999, loss = 0.00028036637, step = 3356 (5.430 sec)
2021-12-31 02:02:45,376 [INFO] tensorflow: epoch = 34.95833333333333, learning_rate = 0.0009999999, loss = 0.00028036637, step = 3356 (5.430 sec)
INFO:tensorflow:global_step/sec: 3.13987
2021-12-31 02:02:45,694 [INFO] tensorflow: global_step/sec: 3.13987
2021-12-31 02:02:46,671 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 35/120: loss: 0.00023 learning rate: 0.00100 Time taken: 0:00:30.711346 ETA: 0:43:30.464382
INFO:tensorflow:global_step/sec: 3.10028
2021-12-31 02:02:48,597 [INFO] tensorflow: global_step/sec: 3.10028
INFO:tensorflow:epoch = 35.135416666666664, learning_rate = 0.0009999999, loss = 0.00024327867, step = 3373 (5.474 sec)
2021-12-31 02:02:50,851 [INFO] tensorflow: epoch = 35.135416666666664, learning_rate = 0.0009999999, loss = 0.00024327867

2021-12-31 02:04:03,448 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.813
INFO:tensorflow:global_step/sec: 3.09458
2021-12-31 02:04:03,782 [INFO] tensorflow: global_step/sec: 3.09458
INFO:tensorflow:global_step/sec: 3.13484
2021-12-31 02:04:06,653 [INFO] tensorflow: global_step/sec: 3.13484
INFO:tensorflow:epoch = 37.61458333333333, learning_rate = 0.0009999999, loss = 0.0002177629, step = 3611 (5.471 sec)
2021-12-31 02:04:07,313 [INFO] tensorflow: epoch = 37.61458333333333, learning_rate = 0.0009999999, loss = 0.0002177629, step = 3611 (5.471 sec)
INFO:tensorflow:global_step/sec: 3.07231
2021-12-31 02:04:09,582 [INFO] tensorflow: global_step/sec: 3.07231
2021-12-31 02:04:11,552 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.679
INFO:tensorflow:global_step/sec: 3.06304
2021-12-31 02:04:12,521 [INFO] tensorflow: global_step/sec: 3.06304
INFO:tensorflow:epoch = 37.791666666666664, learning_rate = 0.0009999999, loss = 0.00018512359, step = 3628 (5.52

2021-12-31 02:05:46,437 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 23, 1.11s/step
Matching predictions to ground truth, class 1/1.: 100%|█| 24981/24981 [00:01<00:00, 15220.36it/s]
Epoch 40/120

Validation cost: 0.000263
Mean average_precision (in %): 46.6108

class name average precision (in %)
------------ --------------------------
car 46.6108

Median Inference Time: 0.016943
INFO:tensorflow:epoch = 40.0, learning_rate = 0.0009999999, loss = 0.00024275055, step = 3840 (34.050 sec)
2021-12-31 02:05:52,164 [INFO] tensorflow: epoch = 40.0, learning_rate = 0.0009999999, loss = 0.00024275055, step = 3840 (34.050 sec)
2021-12-31 02:05:52,165 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 40/120: loss: 0.00024 learning rate: 0.00100 Time taken: 0:01:02.300122 ETA: 1:23:04.009781
INFO:tensorflow:global_step/sec: 0.261839
2021-12-31 02:05:53,126 [INFO] tensorflow: global_step/sec: 0.261839
2021-12-31 02:05:55,045 [INFO] modulus.hooks.sample_counter_hook: Train

2021-12-31 02:07:07,091 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.740
INFO:tensorflow:global_step/sec: 3.1694
2021-12-31 02:07:08,017 [INFO] tensorflow: global_step/sec: 3.1694
INFO:tensorflow:epoch = 42.479166666666664, learning_rate = 0.0009999999, loss = 0.00018202995, step = 4078 (5.425 sec)
2021-12-31 02:07:08,332 [INFO] tensorflow: epoch = 42.479166666666664, learning_rate = 0.0009999999, loss = 0.00018202995, step = 4078 (5.425 sec)
INFO:tensorflow:global_step/sec: 3.10991
2021-12-31 02:07:10,911 [INFO] tensorflow: global_step/sec: 3.10991
INFO:tensorflow:epoch = 42.65625, learning_rate = 0.0009999999, loss = 0.00026308582, step = 4095 (5.526 sec)
2021-12-31 02:07:13,857 [INFO] tensorflow: epoch = 42.65625, learning_rate = 0.0009999999, loss = 0.00026308582, step = 4095 (5.526 sec)
INFO:tensorflow:global_step/sec: 3.05421
2021-12-31 02:07:13,858 [INFO] tensorflow: global_step/sec: 3.05421
2021-12-31 02:07:15,144 [INFO] modulus.hooks.sample_counter_hook: T

2021-12-31 02:08:27,080 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.885
INFO:tensorflow:global_step/sec: 3.0988
2021-12-31 02:08:28,701 [INFO] tensorflow: global_step/sec: 3.0988
INFO:tensorflow:epoch = 45.135416666666664, learning_rate = 0.0009999999, loss = 0.00021529435, step = 4333 (5.434 sec)
2021-12-31 02:08:29,941 [INFO] tensorflow: epoch = 45.135416666666664, learning_rate = 0.0009999999, loss = 0.00021529435, step = 4333 (5.434 sec)
INFO:tensorflow:global_step/sec: 3.17841
2021-12-31 02:08:31,533 [INFO] tensorflow: global_step/sec: 3.17841
INFO:tensorflow:global_step/sec: 3.14056
2021-12-31 02:08:34,398 [INFO] tensorflow: global_step/sec: 3.14056
2021-12-31 02:08:35,040 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.128
INFO:tensorflow:epoch = 45.3125, learning_rate = 0.0009999999, loss = 0.00022164083, step = 4350 (5.429 sec)
2021-12-31 02:08:35,370 [INFO] tensorflow: epoch = 45.3125, learning_rate = 0.0009999999, loss = 0.00022164083,

2021-12-31 02:09:47,121 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.742
INFO:tensorflow:global_step/sec: 3.0619
2021-12-31 02:09:49,398 [INFO] tensorflow: global_step/sec: 3.0619
INFO:tensorflow:epoch = 47.791666666666664, learning_rate = 0.0009999999, loss = 0.00028851727, step = 4588 (5.546 sec)
2021-12-31 02:09:51,674 [INFO] tensorflow: epoch = 47.791666666666664, learning_rate = 0.0009999999, loss = 0.00028851727, step = 4588 (5.546 sec)
INFO:tensorflow:global_step/sec: 3.0477
2021-12-31 02:09:52,352 [INFO] tensorflow: global_step/sec: 3.0477
INFO:tensorflow:global_step/sec: 3.11529
2021-12-31 02:09:55,241 [INFO] tensorflow: global_step/sec: 3.11529
2021-12-31 02:09:55,241 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.630
INFO:tensorflow:epoch = 47.96875, learning_rate = 0.0009999999, loss = 0.00027920722, step = 4605 (5.463 sec)
2021-12-31 02:09:57,137 [INFO] tensorflow: epoch = 47.96875, learning_rate = 0.0009999999, loss = 0.00027920722,

INFO:tensorflow:epoch = 50.0, learning_rate = 0.0009999999, loss = 0.0001822824, step = 4800 (13.840 sec)
2021-12-31 02:11:11,213 [INFO] tensorflow: epoch = 50.0, learning_rate = 0.0009999999, loss = 0.0001822824, step = 4800 (13.840 sec)
2021-12-31 02:11:11,213 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 50/120: loss: 0.00018 learning rate: 0.00100 Time taken: 0:00:42.084785 ETA: 0:49:05.934932
INFO:tensorflow:global_step/sec: 0.632538
2021-12-31 02:11:13,162 [INFO] tensorflow: global_step/sec: 0.632538
INFO:tensorflow:global_step/sec: 3.12271
2021-12-31 02:11:16,044 [INFO] tensorflow: global_step/sec: 3.12271
INFO:tensorflow:epoch = 50.17708333333333, learning_rate = 0.0009999999, loss = 0.00025421774, step = 4817 (5.495 sec)
2021-12-31 02:11:16,708 [INFO] tensorflow: epoch = 50.17708333333333, learning_rate = 0.0009999999, loss = 0.00025421774, step = 4817 (5.495 sec)
INFO:tensorflow:global_step/sec: 3.09806
2021-12-31 02:11:18,949 [INFO] tensorflow: global_ste

INFO:tensorflow:global_step/sec: 3.09067
2021-12-31 02:12:30,870 [INFO] tensorflow: global_step/sec: 3.09067
2021-12-31 02:12:30,871 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.813
INFO:tensorflow:epoch = 52.65625, learning_rate = 0.0009999999, loss = 0.0002503669, step = 5055 (5.505 sec)
2021-12-31 02:12:32,835 [INFO] tensorflow: epoch = 52.65625, learning_rate = 0.0009999999, loss = 0.0002503669, step = 5055 (5.505 sec)
INFO:tensorflow:global_step/sec: 3.07037
2021-12-31 02:12:33,802 [INFO] tensorflow: global_step/sec: 3.07037
INFO:tensorflow:global_step/sec: 3.09705
2021-12-31 02:12:36,708 [INFO] tensorflow: global_step/sec: 3.09705
INFO:tensorflow:epoch = 52.83333333333333, learning_rate = 0.0009999999, loss = 0.00019338886, step = 5072 (5.497 sec)
2021-12-31 02:12:38,332 [INFO] tensorflow: epoch = 52.83333333333333, learning_rate = 0.0009999999, loss = 0.00019338886, step = 5072 (5.497 sec)
2021-12-31 02:12:38,932 [INFO] modulus.hooks.sample_counter_hook: Tra

2021-12-31 02:13:51,003 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.880
INFO:tensorflow:global_step/sec: 3.13382
2021-12-31 02:13:51,646 [INFO] tensorflow: global_step/sec: 3.13382
INFO:tensorflow:epoch = 55.3125, learning_rate = 0.0009999999, loss = 0.00022903821, step = 5310 (5.405 sec)
2021-12-31 02:13:54,501 [INFO] tensorflow: epoch = 55.3125, learning_rate = 0.0009999999, loss = 0.00022903821, step = 5310 (5.405 sec)
INFO:tensorflow:global_step/sec: 3.15213
2021-12-31 02:13:54,502 [INFO] tensorflow: global_step/sec: 3.15213
INFO:tensorflow:global_step/sec: 3.12664
2021-12-31 02:13:57,380 [INFO] tensorflow: global_step/sec: 3.12664
2021-12-31 02:13:58,915 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.279
INFO:tensorflow:epoch = 55.48958333333333, learning_rate = 0.0009999999, loss = 0.0003117303, step = 5327 (5.370 sec)
2021-12-31 02:13:59,871 [INFO] tensorflow: epoch = 55.48958333333333, learning_rate = 0.0009999999, loss = 0.0003117303, s

2021-12-31 02:15:11,431 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.902
INFO:tensorflow:global_step/sec: 3.10838
2021-12-31 02:15:12,727 [INFO] tensorflow: global_step/sec: 3.10838
INFO:tensorflow:global_step/sec: 3.09767
2021-12-31 02:15:15,633 [INFO] tensorflow: global_step/sec: 3.09767
INFO:tensorflow:epoch = 57.96875, learning_rate = 0.0009999999, loss = 0.00016327405, step = 5565 (5.467 sec)
2021-12-31 02:15:16,576 [INFO] tensorflow: epoch = 57.96875, learning_rate = 0.0009999999, loss = 0.00016327405, step = 5565 (5.467 sec)
2021-12-31 02:15:17,548 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 58/120: loss: 0.00028 learning rate: 0.00100 Time taken: 0:00:30.917719 ETA: 0:31:56.898571
INFO:tensorflow:global_step/sec: 3.15148
2021-12-31 02:15:18,488 [INFO] tensorflow: global_step/sec: 3.15148
2021-12-31 02:15:19,449 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.944
INFO:tensorflow:global_step/sec: 3.10904
2021-12-31 02:1

INFO:tensorflow:global_step/sec: 3.08222
2021-12-31 02:16:29,459 [INFO] tensorflow: global_step/sec: 3.08222
2021-12-31 02:16:31,053 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.026
INFO:tensorflow:epoch = 60.17708333333333, learning_rate = 0.0009999999, loss = 0.0002908432, step = 5777 (5.486 sec)
2021-12-31 02:16:32,025 [INFO] tensorflow: epoch = 60.17708333333333, learning_rate = 0.0009999999, loss = 0.0002908432, step = 5777 (5.486 sec)
INFO:tensorflow:global_step/sec: 3.12999
2021-12-31 02:16:32,334 [INFO] tensorflow: global_step/sec: 3.12999
INFO:tensorflow:global_step/sec: 3.11828
2021-12-31 02:16:35,221 [INFO] tensorflow: global_step/sec: 3.11828
INFO:tensorflow:epoch = 60.354166666666664, learning_rate = 0.0009999999, loss = 0.0003057906, step = 5794 (5.361 sec)
2021-12-31 02:16:37,385 [INFO] tensorflow: epoch = 60.354166666666664, learning_rate = 0.0009999999, loss = 0.0003057906, step = 5794 (5.361 sec)
INFO:tensorflow:global_step/sec: 3.19024
2021-12-31

INFO:tensorflow:global_step/sec: 3.08351
2021-12-31 02:17:50,049 [INFO] tensorflow: global_step/sec: 3.08351
2021-12-31 02:17:51,016 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.995
INFO:tensorflow:global_step/sec: 3.12408
2021-12-31 02:17:52,930 [INFO] tensorflow: global_step/sec: 3.12408
INFO:tensorflow:epoch = 62.83333333333333, learning_rate = 0.0009999999, loss = 0.0001879006, step = 6032 (5.466 sec)
2021-12-31 02:17:53,575 [INFO] tensorflow: epoch = 62.83333333333333, learning_rate = 0.0009999999, loss = 0.0001879006, step = 6032 (5.466 sec)
INFO:tensorflow:global_step/sec: 3.16032
2021-12-31 02:17:55,778 [INFO] tensorflow: global_step/sec: 3.16032
INFO:tensorflow:global_step/sec: 3.12203
2021-12-31 02:17:58,661 [INFO] tensorflow: global_step/sec: 3.12203
2021-12-31 02:17:58,662 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 63/120: loss: 0.00019 learning rate: 0.00100 Time taken: 0:00:30.617789 ETA: 0:29:05.213948
INFO:tensorflow:epoch = 6

INFO:tensorflow:global_step/sec: 3.10414
2021-12-31 02:19:10,709 [INFO] tensorflow: global_step/sec: 3.10414
2021-12-31 02:19:11,031 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.865
INFO:tensorflow:global_step/sec: 3.10837
2021-12-31 02:19:13,605 [INFO] tensorflow: global_step/sec: 3.10837
INFO:tensorflow:epoch = 65.48958333333333, learning_rate = 0.0009999999, loss = 0.00018664336, step = 6287 (5.440 sec)
2021-12-31 02:19:15,188 [INFO] tensorflow: epoch = 65.48958333333333, learning_rate = 0.0009999999, loss = 0.00018664336, step = 6287 (5.440 sec)
INFO:tensorflow:global_step/sec: 3.15104
2021-12-31 02:19:16,461 [INFO] tensorflow: global_step/sec: 3.15104
2021-12-31 02:19:19,047 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.950
INFO:tensorflow:global_step/sec: 3.09198
2021-12-31 02:19:19,372 [INFO] tensorflow: global_step/sec: 3.09198
INFO:tensorflow:epoch = 65.66666666666666, learning_rate = 0.0009999999, loss = 0.00019480637, step = 6304 (5.4

2021-12-31 02:20:32,572 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 68/120: loss: 0.00024 learning rate: 0.00100 Time taken: 0:00:30.833629 ETA: 0:26:43.348702
INFO:tensorflow:global_step/sec: 3.07581
2021-12-31 02:20:34,511 [INFO] tensorflow: global_step/sec: 3.07581
INFO:tensorflow:epoch = 68.14583333333333, learning_rate = 0.0009999999, loss = 0.00027407007, step = 6542 (5.450 sec)
2021-12-31 02:20:37,034 [INFO] tensorflow: epoch = 68.14583333333333, learning_rate = 0.0009999999, loss = 0.00027407007, step = 6542 (5.450 sec)
INFO:tensorflow:global_step/sec: 3.15916
2021-12-31 02:20:37,360 [INFO] tensorflow: global_step/sec: 3.15916
2021-12-31 02:20:39,304 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.837
INFO:tensorflow:global_step/sec: 3.10113
2021-12-31 02:20:40,262 [INFO] tensorflow: global_step/sec: 3.10113
INFO:tensorflow:epoch = 68.32291666666666, learning_rate = 0.0009999999, loss = 0.00020419603, step = 6559 (5.468 sec)
2021-12-31 02

2021-12-31 02:21:51,473 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.859
INFO:tensorflow:global_step/sec: 3.11863
2021-12-31 02:21:51,806 [INFO] tensorflow: global_step/sec: 3.11863
INFO:tensorflow:epoch = 70.35416666666666, learning_rate = 0.0009999999, loss = 0.0001972542, step = 6754 (5.458 sec)
2021-12-31 02:21:53,088 [INFO] tensorflow: epoch = 70.35416666666666, learning_rate = 0.0009999999, loss = 0.0001972542, step = 6754 (5.458 sec)
INFO:tensorflow:global_step/sec: 3.11907
2021-12-31 02:21:54,691 [INFO] tensorflow: global_step/sec: 3.11907
INFO:tensorflow:global_step/sec: 3.09339
2021-12-31 02:21:57,600 [INFO] tensorflow: global_step/sec: 3.09339
INFO:tensorflow:epoch = 70.53125, learning_rate = 0.0009999999, loss = 0.00021731402, step = 6771 (5.464 sec)
2021-12-31 02:21:58,551 [INFO] tensorflow: epoch = 70.53125, learning_rate = 0.0009999999, loss = 0.00021731402, step = 6771 (5.464 sec)
2021-12-31 02:21:59,525 [INFO] modulus.hooks.sample_counter_hook: Tra

2021-12-31 02:23:11,967 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.840
INFO:tensorflow:global_step/sec: 3.10821
2021-12-31 02:23:12,945 [INFO] tensorflow: global_step/sec: 3.10821
2021-12-31 02:23:14,888 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 73/120: loss: 0.00019 learning rate: 0.00100 Time taken: 0:00:31.003034 ETA: 0:24:17.142592
INFO:tensorflow:epoch = 73.01041666666666, learning_rate = 0.0009999999, loss = 0.00026072646, step = 7009 (5.510 sec)
2021-12-31 02:23:15,227 [INFO] tensorflow: epoch = 73.01041666666666, learning_rate = 0.0009999999, loss = 0.00026072646, step = 7009 (5.510 sec)
INFO:tensorflow:global_step/sec: 3.11509
2021-12-31 02:23:15,834 [INFO] tensorflow: global_step/sec: 3.11509
INFO:tensorflow:global_step/sec: 3.11245
2021-12-31 02:23:18,726 [INFO] tensorflow: global_step/sec: 3.11245
2021-12-31 02:23:20,023 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.828
INFO:tensorflow:epoch = 73.1875, learn

2021-12-31 02:24:32,300 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.917
INFO:tensorflow:global_step/sec: 3.1235
2021-12-31 02:24:33,897 [INFO] tensorflow: global_step/sec: 3.1235
INFO:tensorflow:global_step/sec: 3.09443
2021-12-31 02:24:36,806 [INFO] tensorflow: global_step/sec: 3.09443
INFO:tensorflow:epoch = 75.66666666666666, learning_rate = 0.0009999999, loss = 0.00021688137, step = 7264 (5.465 sec)
2021-12-31 02:24:37,125 [INFO] tensorflow: epoch = 75.66666666666666, learning_rate = 0.0009999999, loss = 0.00021688137, step = 7264 (5.465 sec)
INFO:tensorflow:global_step/sec: 3.1361
2021-12-31 02:24:39,676 [INFO] tensorflow: global_step/sec: 3.1361
2021-12-31 02:24:40,322 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.931
INFO:tensorflow:epoch = 75.84375, learning_rate = 0.0009999999, loss = 0.000215515, step = 7281 (5.408 sec)
2021-12-31 02:24:42,533 [INFO] tensorflow: epoch = 75.84375, learning_rate = 0.0009999999, loss = 0.000215515, step 

INFO:tensorflow:epoch = 78.14583333333333, learning_rate = 0.0009999999, loss = 0.00024218127, step = 7502 (5.435 sec)
2021-12-31 02:25:53,850 [INFO] tensorflow: epoch = 78.14583333333333, learning_rate = 0.0009999999, loss = 0.00024218127, step = 7502 (5.435 sec)
INFO:tensorflow:global_step/sec: 3.09891
2021-12-31 02:25:55,122 [INFO] tensorflow: global_step/sec: 3.09891
INFO:tensorflow:global_step/sec: 3.09088
2021-12-31 02:25:58,034 [INFO] tensorflow: global_step/sec: 3.09088
INFO:tensorflow:epoch = 78.32291666666666, learning_rate = 0.0009999999, loss = 0.00016379161, step = 7519 (5.444 sec)
2021-12-31 02:25:59,295 [INFO] tensorflow: epoch = 78.32291666666666, learning_rate = 0.0009999999, loss = 0.00016379161, step = 7519 (5.444 sec)
INFO:tensorflow:global_step/sec: 3.15911
2021-12-31 02:26:00,883 [INFO] tensorflow: global_step/sec: 3.15911
2021-12-31 02:26:00,883 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.904
INFO:tensorflow:global_step/sec: 3.1568
2021-12-3

INFO:tensorflow:global_step/sec: 3.17417
2021-12-31 02:27:11,952 [INFO] tensorflow: global_step/sec: 3.17417
2021-12-31 02:27:12,572 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.066
INFO:tensorflow:epoch = 80.53125, learning_rate = 0.0009999999, loss = 0.00025123367, step = 7731 (5.353 sec)
2021-12-31 02:27:14,790 [INFO] tensorflow: epoch = 80.53125, learning_rate = 0.0009999999, loss = 0.00025123367, step = 7731 (5.353 sec)
INFO:tensorflow:global_step/sec: 3.16977
2021-12-31 02:27:14,791 [INFO] tensorflow: global_step/sec: 3.16977
INFO:tensorflow:global_step/sec: 3.15422
2021-12-31 02:27:17,644 [INFO] tensorflow: global_step/sec: 3.15422
INFO:tensorflow:epoch = 80.70833333333333, learning_rate = 0.0009999999, loss = 0.00019312736, step = 7748 (5.394 sec)
2021-12-31 02:27:20,184 [INFO] tensorflow: epoch = 80.70833333333333, learning_rate = 0.0009999999, loss = 0.00019312736, step = 7748 (5.394 sec)
INFO:tensorflow:global_step/sec: 3.14689
2021-12-31 02:27:20,504 [I

INFO:tensorflow:global_step/sec: 3.045
2021-12-31 02:28:32,875 [INFO] tensorflow: global_step/sec: 3.045
2021-12-31 02:28:32,876 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.756
INFO:tensorflow:global_step/sec: 3.10973
2021-12-31 02:28:35,769 [INFO] tensorflow: global_step/sec: 3.10973
INFO:tensorflow:epoch = 83.1875, learning_rate = 0.0009999999, loss = 0.00021577808, step = 7986 (5.518 sec)
2021-12-31 02:28:36,754 [INFO] tensorflow: epoch = 83.1875, learning_rate = 0.0009999999, loss = 0.00021577808, step = 7986 (5.518 sec)
INFO:tensorflow:global_step/sec: 3.08976
2021-12-31 02:28:38,682 [INFO] tensorflow: global_step/sec: 3.08976
2021-12-31 02:28:40,872 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.012
INFO:tensorflow:global_step/sec: 3.16801
2021-12-31 02:28:41,523 [INFO] tensorflow: global_step/sec: 3.16801
INFO:tensorflow:epoch = 83.36458333333333, learning_rate = 0.0009999999, loss = 0.00020284054, step = 8003 (5.398 sec)
2021-12-31 02:28

INFO:tensorflow:global_step/sec: 3.12469
2021-12-31 02:29:53,564 [INFO] tensorflow: global_step/sec: 3.12469
INFO:tensorflow:global_step/sec: 3.1005
2021-12-31 02:29:56,467 [INFO] tensorflow: global_step/sec: 3.1005
INFO:tensorflow:epoch = 85.84375, learning_rate = 0.000762346, loss = 0.0002100628, step = 8241 (5.461 sec)
2021-12-31 02:29:58,372 [INFO] tensorflow: epoch = 85.84375, learning_rate = 0.000762346, loss = 0.0002100628, step = 8241 (5.461 sec)
INFO:tensorflow:global_step/sec: 3.14253
2021-12-31 02:29:59,331 [INFO] tensorflow: global_step/sec: 3.14253
2021-12-31 02:30:00,907 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.013
INFO:tensorflow:global_step/sec: 3.13629
2021-12-31 02:30:02,201 [INFO] tensorflow: global_step/sec: 3.13629
2021-12-31 02:30:03,150 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 86/120: loss: 0.00021 learning rate: 0.00075 Time taken: 0:00:30.691654 ETA: 0:17:23.516227
INFO:tensorflow:epoch = 86.02083333333333, lear

INFO:tensorflow:epoch = 88.32291666666666, learning_rate = 0.0005292853, loss = 0.00020119053, step = 8479 (5.475 sec)
2021-12-31 02:31:14,628 [INFO] tensorflow: epoch = 88.32291666666666, learning_rate = 0.0005292853, loss = 0.00020119053, step = 8479 (5.475 sec)
INFO:tensorflow:global_step/sec: 3.10034
2021-12-31 02:31:17,214 [INFO] tensorflow: global_step/sec: 3.10034
INFO:tensorflow:epoch = 88.5, learning_rate = 0.000515669, loss = 0.000196107, step = 8496 (5.481 sec)
2021-12-31 02:31:20,109 [INFO] tensorflow: epoch = 88.5, learning_rate = 0.000515669, loss = 0.000196107, step = 8496 (5.481 sec)
INFO:tensorflow:global_step/sec: 3.10811
2021-12-31 02:31:20,109 [INFO] tensorflow: global_step/sec: 3.10811
2021-12-31 02:31:21,048 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.936
INFO:tensorflow:global_step/sec: 3.12467
2021-12-31 02:31:22,990 [INFO] tensorflow: global_step/sec: 3.12467
INFO:tensorflow:epoch = 88.67708333333333, learning_rate = 0.000502403, loss = 0.

2021-12-31 02:32:32,624 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.112
INFO:tensorflow:global_step/sec: 3.104
2021-12-31 02:32:33,917 [INFO] tensorflow: global_step/sec: 3.104
INFO:tensorflow:epoch = 90.70833333333333, learning_rate = 0.00037258043, loss = 0.00018155035, step = 8708 (5.485 sec)
2021-12-31 02:32:35,528 [INFO] tensorflow: epoch = 90.70833333333333, learning_rate = 0.00037258043, loss = 0.00018155035, step = 8708 (5.485 sec)
INFO:tensorflow:global_step/sec: 3.09252
2021-12-31 02:32:36,827 [INFO] tensorflow: global_step/sec: 3.09252
INFO:tensorflow:global_step/sec: 3.1574
2021-12-31 02:32:39,678 [INFO] tensorflow: global_step/sec: 3.1574
2021-12-31 02:32:40,656 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.901
INFO:tensorflow:epoch = 90.88541666666666, learning_rate = 0.0003629951, loss = 0.00016156193, step = 8725 (5.425 sec)
2021-12-31 02:32:40,953 [INFO] tensorflow: epoch = 90.88541666666666, learning_rate = 0.0003629951, loss 

2021-12-31 02:33:52,560 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.811
INFO:tensorflow:global_step/sec: 3.13823
2021-12-31 02:33:54,448 [INFO] tensorflow: global_step/sec: 3.13823
INFO:tensorflow:epoch = 93.375, learning_rate = 0.00025163597, loss = 0.00015438857, step = 8964 (5.406 sec)
2021-12-31 02:33:57,315 [INFO] tensorflow: epoch = 93.375, learning_rate = 0.00025163597, loss = 0.00015438857, step = 8964 (5.406 sec)
INFO:tensorflow:global_step/sec: 3.13883
2021-12-31 02:33:57,315 [INFO] tensorflow: global_step/sec: 3.13883
INFO:tensorflow:global_step/sec: 3.14008
2021-12-31 02:34:00,182 [INFO] tensorflow: global_step/sec: 3.14008
2021-12-31 02:34:00,499 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.191
INFO:tensorflow:epoch = 93.55208333333333, learning_rate = 0.00024516255, loss = 0.00014850884, step = 8981 (5.435 sec)
2021-12-31 02:34:02,750 [INFO] tensorflow: epoch = 93.55208333333333, learning_rate = 0.00024516255, loss = 0.0001485088

INFO:tensorflow:global_step/sec: 3.13144
2021-12-31 02:35:15,189 [INFO] tensorflow: global_step/sec: 3.13144
INFO:tensorflow:global_step/sec: 3.16175
2021-12-31 02:35:18,036 [INFO] tensorflow: global_step/sec: 3.16175
2021-12-31 02:35:18,036 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 96/120: loss: 0.00017 learning rate: 0.00017 Time taken: 0:00:30.862694 ETA: 0:12:20.704662
INFO:tensorflow:epoch = 96.03125, learning_rate = 0.00017021279, loss = 0.00017275798, step = 9219 (5.375 sec)
2021-12-31 02:35:18,946 [INFO] tensorflow: epoch = 96.03125, learning_rate = 0.00017021279, loss = 0.00017275798, step = 9219 (5.375 sec)
2021-12-31 02:35:20,533 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.328
INFO:tensorflow:global_step/sec: 3.1987
2021-12-31 02:35:20,849 [INFO] tensorflow: global_step/sec: 3.1987
INFO:tensorflow:global_step/sec: 3.11133
2021-12-31 02:35:23,742 [INFO] tensorflow: global_step/sec: 3.11133
INFO:tensorflow:epoch = 96.20833333333333

INFO:tensorflow:global_step/sec: 3.0515
2021-12-31 02:36:35,996 [INFO] tensorflow: global_step/sec: 3.0515
INFO:tensorflow:global_step/sec: 3.12682
2021-12-31 02:36:38,874 [INFO] tensorflow: global_step/sec: 3.12682
INFO:tensorflow:epoch = 98.6875, learning_rate = 0.00011513606, loss = 0.00016195548, step = 9474 (5.484 sec)
2021-12-31 02:36:40,820 [INFO] tensorflow: epoch = 98.6875, learning_rate = 0.00011513606, loss = 0.00016195548, step = 9474 (5.484 sec)
2021-12-31 02:36:40,820 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.686
INFO:tensorflow:global_step/sec: 3.09286
2021-12-31 02:36:41,784 [INFO] tensorflow: global_step/sec: 3.09286
INFO:tensorflow:global_step/sec: 3.17436
2021-12-31 02:36:44,620 [INFO] tensorflow: global_step/sec: 3.17436
INFO:tensorflow:epoch = 98.86458333333333, learning_rate = 0.000112174144, loss = 0.00015241868, step = 9491 (5.377 sec)
2021-12-31 02:36:46,197 [INFO] tensorflow: epoch = 98.86458333333333, learning_rate = 0.000112174144, lo

INFO:tensorflow:global_step/sec: 3.09833
2021-12-31 02:37:55,878 [INFO] tensorflow: global_step/sec: 3.09833
INFO:tensorflow:epoch = 100.88541666666666, learning_rate = 8.331552e-05, loss = 0.0001583534, step = 9685 (5.544 sec)
2021-12-31 02:37:56,210 [INFO] tensorflow: epoch = 100.88541666666666, learning_rate = 8.331552e-05, loss = 0.0001583534, step = 9685 (5.544 sec)
INFO:tensorflow:global_step/sec: 3.07079
2021-12-31 02:37:58,809 [INFO] tensorflow: global_step/sec: 3.07079
2021-12-31 02:37:59,780 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 101/120: loss: 0.00018 learning rate: 0.00008 Time taken: 0:00:31.065064 ETA: 0:09:50.236215
2021-12-31 02:38:00,763 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.616
INFO:tensorflow:epoch = 101.0625, learning_rate = 8.11722e-05, loss = 0.0001411916, step = 9702 (5.538 sec)
2021-12-31 02:38:01,748 [INFO] tensorflow: epoch = 101.0625, learning_rate = 8.11722e-05, loss = 0.0001411916, step = 9702 (5.538 se

2021-12-31 02:39:12,960 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.976
INFO:tensorflow:global_step/sec: 3.08258
2021-12-31 02:39:13,932 [INFO] tensorflow: global_step/sec: 3.08258
INFO:tensorflow:global_step/sec: 3.1031
2021-12-31 02:39:16,832 [INFO] tensorflow: global_step/sec: 3.1031
INFO:tensorflow:epoch = 103.54166666666666, learning_rate = 5.6356632e-05, loss = 0.00013988047, step = 9940 (5.492 sec)
2021-12-31 02:39:18,124 [INFO] tensorflow: epoch = 103.54166666666666, learning_rate = 5.6356632e-05, loss = 0.00013988047, step = 9940 (5.492 sec)
INFO:tensorflow:global_step/sec: 3.11806
2021-12-31 02:39:19,719 [INFO] tensorflow: global_step/sec: 3.11806
2021-12-31 02:39:20,985 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.922
INFO:tensorflow:global_step/sec: 3.1404
2021-12-31 02:39:22,585 [INFO] tensorflow: global_step/sec: 3.1404
INFO:tensorflow:epoch = 103.71875, learning_rate = 5.490684e-05, loss = 0.00015159059, step = 9957 (5.406 sec)


INFO:tensorflow:epoch = 106.02083333333333, learning_rate = 3.9127593e-05, loss = 0.00016553479, step = 10178 (5.472 sec)
2021-12-31 02:40:34,167 [INFO] tensorflow: epoch = 106.02083333333333, learning_rate = 3.9127593e-05, loss = 0.00016553479, step = 10178 (5.472 sec)
INFO:tensorflow:global_step/sec: 3.08729
2021-12-31 02:40:34,496 [INFO] tensorflow: global_step/sec: 3.08729
INFO:tensorflow:global_step/sec: 3.1182
2021-12-31 02:40:37,382 [INFO] tensorflow: global_step/sec: 3.1182
INFO:tensorflow:epoch = 106.19791666666666, learning_rate = 3.8121023e-05, loss = 0.00013836674, step = 10195 (5.445 sec)
2021-12-31 02:40:39,613 [INFO] tensorflow: epoch = 106.19791666666666, learning_rate = 3.8121023e-05, loss = 0.00013836674, step = 10195 (5.445 sec)
INFO:tensorflow:global_step/sec: 3.12123
2021-12-31 02:40:40,266 [INFO] tensorflow: global_step/sec: 3.12123
2021-12-31 02:40:40,916 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.853
INFO:tensorflow:global_step/sec: 3.1067

2021-12-31 02:41:53,041 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.951
INFO:tensorflow:global_step/sec: 3.11986
2021-12-31 02:41:55,284 [INFO] tensorflow: global_step/sec: 3.11986
INFO:tensorflow:epoch = 108.67708333333333, learning_rate = 2.646685e-05, loss = 0.00013514244, step = 10433 (5.449 sec)
2021-12-31 02:41:55,904 [INFO] tensorflow: epoch = 108.67708333333333, learning_rate = 2.646685e-05, loss = 0.00013514244, step = 10433 (5.449 sec)
INFO:tensorflow:global_step/sec: 3.14825
2021-12-31 02:41:58,143 [INFO] tensorflow: global_step/sec: 3.14825
INFO:tensorflow:global_step/sec: 3.13563
2021-12-31 02:42:01,013 [INFO] tensorflow: global_step/sec: 3.13563
2021-12-31 02:42:01,014 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.086
INFO:tensorflow:epoch = 108.85416666666666, learning_rate = 2.5785983e-05, loss = 0.00013020061, step = 10450 (5.438 sec)
2021-12-31 02:42:01,342 [INFO] tensorflow: epoch = 108.85416666666666, learning_rate = 2.57859

INFO:tensorflow:epoch = 110.88541666666666, learning_rate = 1.912279e-05, loss = 0.00013265897, step = 10645 (5.434 sec)
2021-12-31 02:43:11,241 [INFO] tensorflow: epoch = 110.88541666666666, learning_rate = 1.912279e-05, loss = 0.00013265897, step = 10645 (5.434 sec)
INFO:tensorflow:global_step/sec: 3.15653
2021-12-31 02:43:11,875 [INFO] tensorflow: global_step/sec: 3.15653
2021-12-31 02:43:12,500 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.088
INFO:tensorflow:global_step/sec: 3.08985
2021-12-31 02:43:14,788 [INFO] tensorflow: global_step/sec: 3.08985
2021-12-31 02:43:14,789 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 111/120: loss: 0.00016 learning rate: 0.00002 Time taken: 0:00:30.926534 ETA: 0:04:38.338803
INFO:tensorflow:epoch = 111.0625, learning_rate = 1.863085e-05, loss = 0.00014225669, step = 10662 (5.450 sec)
2021-12-31 02:43:16,691 [INFO] tensorflow: epoch = 111.0625, learning_rate = 1.863085e-05, loss = 0.00014225669, step = 10662

INFO:tensorflow:global_step/sec: 3.10294
2021-12-31 02:44:29,891 [INFO] tensorflow: global_step/sec: 3.10294
INFO:tensorflow:global_step/sec: 3.06893
2021-12-31 02:44:32,823 [INFO] tensorflow: global_step/sec: 3.06893
2021-12-31 02:44:32,824 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.655
INFO:tensorflow:epoch = 113.54166666666666, learning_rate = 1.2935117e-05, loss = 0.00013780282, step = 10900 (5.493 sec)
2021-12-31 02:44:33,132 [INFO] tensorflow: epoch = 113.54166666666666, learning_rate = 1.2935117e-05, loss = 0.00013780282, step = 10900 (5.493 sec)
INFO:tensorflow:global_step/sec: 3.19901
2021-12-31 02:44:35,637 [INFO] tensorflow: global_step/sec: 3.19901
INFO:tensorflow:epoch = 113.71875, learning_rate = 1.2602345e-05, loss = 0.0001215701, step = 10917 (5.429 sec)
2021-12-31 02:44:38,560 [INFO] tensorflow: epoch = 113.71875, learning_rate = 1.2602345e-05, loss = 0.0001215701, step = 10917 (5.429 sec)
INFO:tensorflow:global_step/sec: 3.07774
2021-12-31 02:44

INFO:tensorflow:epoch = 116.02083333333333, learning_rate = 8.980656e-06, loss = 0.00011833891, step = 11138 (5.373 sec)
2021-12-31 02:45:49,241 [INFO] tensorflow: epoch = 116.02083333333333, learning_rate = 8.980656e-06, loss = 0.00011833891, step = 11138 (5.373 sec)
INFO:tensorflow:global_step/sec: 3.11869
2021-12-31 02:45:50,512 [INFO] tensorflow: global_step/sec: 3.11869
2021-12-31 02:45:52,728 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 25.266
INFO:tensorflow:global_step/sec: 3.16797
2021-12-31 02:45:53,353 [INFO] tensorflow: global_step/sec: 3.16797
INFO:tensorflow:epoch = 116.19791666666666, learning_rate = 8.749617e-06, loss = 0.00015627059, step = 11155 (5.384 sec)
2021-12-31 02:45:54,625 [INFO] tensorflow: epoch = 116.19791666666666, learning_rate = 8.749617e-06, loss = 0.00015627059, step = 11155 (5.384 sec)
INFO:tensorflow:global_step/sec: 3.14015
2021-12-31 02:45:56,219 [INFO] tensorflow: global_step/sec: 3.14015
INFO:tensorflow:global_step/sec: 3.13151


INFO:tensorflow:epoch = 118.67708333333333, learning_rate = 6.0747384e-06, loss = 0.00011625141, step = 11393 (5.486 sec)
2021-12-31 02:47:11,002 [INFO] tensorflow: epoch = 118.67708333333333, learning_rate = 6.0747384e-06, loss = 0.00011625141, step = 11393 (5.486 sec)
INFO:tensorflow:global_step/sec: 3.14265
2021-12-31 02:47:11,309 [INFO] tensorflow: global_step/sec: 3.14265
2021-12-31 02:47:12,900 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 24.995
INFO:tensorflow:global_step/sec: 3.0798
2021-12-31 02:47:14,231 [INFO] tensorflow: global_step/sec: 3.0798
INFO:tensorflow:epoch = 118.85416666666666, learning_rate = 5.9184586e-06, loss = 0.00016136347, step = 11410 (5.511 sec)
2021-12-31 02:47:16,514 [INFO] tensorflow: epoch = 118.85416666666666, learning_rate = 5.9184586e-06, loss = 0.00016136347, step = 11410 (5.511 sec)
INFO:tensorflow:global_step/sec: 3.09302
2021-12-31 02:47:17,141 [INFO] tensorflow: global_step/sec: 3.09302
INFO:tensorflow:global_step/sec: 3.1024

In [20]:
# Listing the newly retrained model.
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights

total 45472
-rw-r--r-- 1 guest guest 46562000 Dec 31 10:47 resnet18_detector_pruned.tlt


## 8. Evaluate the retrained model 

This section evaluates the pruned and retrained model, using the `evaluate` command.

In [8]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
 -k $KEY

2022-01-20 10:24:05,449 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-9f7b8ln9 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.

2022-01-20 02:24:11,405 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/specs/detectnet_v2_retrain_resnet18_kitti.txt




















2022-01-20 02:24:13,642 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
2022-01-20 02:24:13,752 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-01-20 02:24:13,752 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding e

2022-01-20 02:24:14,040 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2022-01-20 02:24:14,045 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2022-01-20 02:24:14,046 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000


2022-01-20 02:24:14,263 [INFO] iva.detectnet_v2.evaluation.build_evaluator: Found 190 samples in validation set












__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to 
input_1 (InputLayer) (None, 3, 544, 960) 0 
__________________________________________________________________________________________________
input_1_qdq (QDQ) (None, 3, 544, 960) 1 input_1[0][0] 
__________________________________________________________________________________________________
conv1 (QuantizedConv2D) (None, 64, 272, 480) 9472 input_1_qdq[0

INFO:tensorflow:Graph was finalized.
2022-01-20 02:24:15,215 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2022-01-20 02:24:15,846 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2022-01-20 02:24:16,087 [INFO] tensorflow: Done running local_init_op.
2022-01-20 02:24:16,673 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 24, 0.00s/step
2022-01-20 02:24:22,679 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 24, 0.60s/step
2022-01-20 02:24:24,087 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 24, 0.14s/step
Matching predictions to ground truth, class 1/1.: 100%|█| 990/990 [00:00<00:00, 15854.94it/s]





Validation cost: 0.001136
Mean average_precision (in %): 93.0563

class name average precision (in %)
------------ --------------------------
car 93.0563

Median Inference Time: 0.014622
2022-01-20 02:24:24,875 [INFO] __main__: Evaluation complete.
Time taken to run __main__:main: 0:00:13.471

## 9. Visualize inferences 
In this section, we run the `inference` tool to generate inferences on the trained models. To render bboxes from more classes, please edit the spec file `detectnet_v2_inference_kitti_tlt.txt` to include all the classes you would like to visualize and edit the rest of the file accordingly.

In [9]:
# Running inference for detection on n images
!tao detectnet_v2 inference -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt \
 -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
 -i $DATA_DOWNLOAD_DIR/training/image_t1 \
 -k $KEY

2022-01-21 18:04:52,062 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-o5bd143n because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.
2022-01-21 10:04:57,722 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/specs/detectnet_v2_inference_kitti_tlt.txt
2022-01-21 10:04:57,723 [INFO] __main__: Overlain images will be saved in the output path.
2022-01-21 10:04:57,723 [INFO] iva.detectnet_v2.inferencer.build_inferencer: Constructing inferencer




2022-01-21 10:04:58,004 [INFO] iva.detectnet_v2.inferencer.tlt_inferencer: Loading model from /workspace/tao-experiments/experiment/experiment_dir_retrain/weights/resnet18_detecto











_________________________________________________________________
Layer (type) Output Shape Param # 
input_1 (InputLayer) (None, 3, 544, 960) 0 
_________________________________________________________________
model_1 (Model) [(None, 1, 34, 60), (None 11550895 
Total params: 11,550,895
Trainable params: 11,539,205
Non-trainable params: 11,690
_________________________________________________________________
2022-01-21 10:05:00,234 [INFO] __main__: Initialized model
2022-01-21 10:05:00,235 [INFO] __main__: Commencing inference
100%|███████████████████████████████████████████| 11/11 [00:23<00:00, 2.13s/it]
2022-01-21 10:05:23,635 [INFO] __main__: Inference complete
2022-01-21 18:05:24,755 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


The `inference` tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/tlt_infer_testing/images_annotated`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/tlt_infer_testing/labels`

*Note: To run inferences for a single image, simply replace the path to the -i flag in `inference` command with the path to the image.*

In [19]:
pip3 list

Package Version
------------------- ---------
argon2-cffi 21.1.0
async-generator 1.10
attrs 21.2.0
backcall 0.2.0
bleach 4.1.0
cached-property 1.5.2
certifi 2020.6.20
cffi 1.15.0
chardet 3.0.4
cycler 0.11.0
Cython 0.29.24
decorator 5.1.0
defusedxml 0.7.1
docker 4.3.1
docker-pycreds 0.4.0
entrypoints 0.3
h5py 3.1.0
idna 2.10
importlib-metadata 4.8.2
ipykernel 5.5.6
ipython 7.16.1
ipython-genutils 0.2.0
ipywidgets 7.6.5
jedi 0.18.1
Jinja2 3.0.3
joblib 1.0.1
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 7.0.6
jupyter-console 6.4.0
jupyter-core 4.9.1
jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.2
kiwisolver 1.3.1
MarkupSafe 2.0.1
matplotlib 3.3.3
mistune 0.8.4
nbclient 0.5.8
nbconvert 6.0.7
nbformat 5.1.3
nest-asyncio 1.5.1
notebook 6.4.6
numpy 1.17.0
nvidia-pyindex 1.0.9
nvidia-tao 0.1.19
opencv-python 3.4.0.12
packaging 21.3
pandocfilters 1.5.0
parso 0.8.2
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.1.0
pip 21.2.2
prometheus-client 

In [22]:
!pip install matplotlib==3.3.3

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [25]:
import matplotlib

ModuleNotFoundError: No module named 'matplotlib'

In [16]:
# Simple grid visualizer
# !pip3 install matplotlib==3.3.3
# %matplotlib inline
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
 output_path = os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], image_dir)
 num_rows = int(ceil(float(num_images) / float(num_cols)))
 f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
 f.tight_layout()
 a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
 if os.path.splitext(image)[1].lower() in valid_image_ext]
 for idx, img_path in enumerate(a[:num_images]):
 col_id = idx % num_cols
 row_id = idx // num_cols
 img = plt.imread(img_path)
 axarr[row_id, col_id].imshow(img) 

ModuleNotFoundError: No module named 'matplotlib'

In [11]:
!pip install matplotlib


Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [12]:
# Visualizing the first 12 images.
OUTPUT_PATH = 'tlt_infer_testing/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

NameError: name 'visualize_images' is not defined

## 10. Model Export 

In [21]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_final
# Removing a pre-existing copy of the etlt if there has been any.
import os
output_file=os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'],
 "experiment_dir_final/resnet18_detector.etlt")
if os.path.exists(output_file):
 os.system("rm {}".format(output_file))
!tao detectnet_v2 export \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
 -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
 -k $KEY

2021-12-24 10:26:16,544 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-9d8ski3x because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.
2021-12-24 02:26:23,902 [INFO] iva.common.export.keras_exporter: Using input nodes: ['input_1']
2021-12-24 02:26:23,902 [INFO] iva.common.export.keras_exporter: Using output nodes: ['output_cov/Sigmoid', 'output_bbox/BiasAdd']
NOTE: UFF has been tested with TensorFlow 1.14.0.
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['output_cov/Sigmoid', 'output_bbox/BiasAdd'] as outputs
2021-12-24 10:26:53,646 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


In [22]:
print('Exported model:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_final

Exported model:
------------
total 244M
-rw-r--r-- 1 guest guest 200M Dec 22 16:27 calibration.tensor
-rw-r--r-- 1 guest guest 45M Dec 24 10:26 resnet18_detector.etlt


### A. Int8 Optimization 
DetectNet_v2 model supports int8 inference mode in TensorRT. 
In order to use int8 mode, we must calibrate the model to run 8-bit inferences -

* Generate calibration tensorfile from the training data using detectnet_v2 calibration_tensorfile
* Use tao export to generate int8 calibration table.

*Note: For this example, we generate a calibration tensorfile containing 10 batches of training data.
Ideally, it is best to use atleast 10-20% of the training data to do so. The more data provided during calibration, the closer int8 inferences are to fp32 inferences.*

*Note: If the model was trained with QAT nodes available, please refrain from using the post training int8 optimization as mentioned below. Please export the model in int8 mode (using the arg `--data_type int8`) with just the path to the calibration cache file (using the argument `--cal_cache_file`)*

In [23]:
# !tao detectnet_v2 calibration_tensorfile -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti_car.txt \
# -m 10 \
# -o $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor
!tao detectnet_v2 calibration_tensorfile -h

2021-12-24 10:30:21,031 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ow0gmh5m because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
usage: detectnet_v2 calibration_tensorfile [-h]
 [--num_processes NUM_PROCESSES]
 [--gpus GPUS]
 [--gpu_index GPU_INDEX [GPU_INDEX ...]]
 [--use_amp] [--log_file LOG_FILE]
 [-e EXPERIMENT_SPEC_FILE]
 [-o OUTPUT_PATH] [-m MAX_BATCHES]
 [-v] [--use_validation_set]
 {calibration_tensorfile,dataset_convert,evaluate,export,inference,prune,train}
 ...

optional arguments:
 -h, --help show this help message and exit
 --num_processes NUM_PROCESSES, -np NUM_PROCESSES
 The number of horovod child processes to be spawned.
 Default is -1(equal to --gpus).
 --gpus GPUS The numbe

In [37]:
# !rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
# !rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!tao detectnet_v2 export \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
 -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
 -k $KEY \
 --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \
 --data_type int8 \
 --batches 10 \
 --batch_size 8 \
 --max_batch_size 8 \
 --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
 --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
 --verbose

2021-12-24 12:07:24,957 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-sctq5k5h because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.
2021-12-24 04:07:32,210 [INFO] iva.common.export.keras_exporter: Using input nodes: ['input_1']
2021-12-24 04:07:32,210 [INFO] iva.common.export.keras_exporter: Using output nodes: ['output_cov/Sigmoid', 'output_bbox/BiasAdd']
2021-12-24 04:07:33,684 [DEBUG] iva.common.export.keras_exporter: Saving etlt model file at: /workspace/tao-experiments/detectnet_v2_car/experiment_dir_final/resnet18_detector.etlt.
2021-12-24 04:07:34,644 [DEBUG] modulus.export._uff: Patching keras BatchNormalization...
2021-12-24 04:07:34,644 [DEBUG] modulus.export.

### B. Generate TensorRT engine 
Verify engine generation using the `tao-converter` utility included with the docker.

The `tao-converter` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please instantiate this docker and execute the `tao-converter` command, with the exported `.etlt` file and calibration cache (for int8 mode) on your target device. The tao-converter utility included in this docker only works for x86 devices, with discrete NVIDIA GPU's. 

For the jetson devices, please download the tao-converter for jetson from the dev zone link [here](https://developer.nvidia.com/tao-converter). 

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.etlt` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

In [None]:
!tao converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
 -k $KEY \
 -c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
 -o output_cov/Sigmoid,output_bbox/BiasAdd \
 -d 3,384,1248 \
 -i nchw \
 -m 64 \
 -t int8 \
 -e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt \
 -b 4

## 11. Verify Deployed Model 
Verify the exported model by visualizing inferences on TensorRT.
In addition to running inference on a `.tlt` model in [step 9](#head-9), the `inference` tool is also capable of consuming the converted `TensorRT engine` from [step 10.B](#head-10-2).

*If after int-8 calibration the accuracy of the int-8 inferences seem to degrade, it could be because the there wasn't enough data in the calibration tensorfile used to calibrate thee model or, the training data is not entirely representative of your test images, and the calibration maybe incorrect. Therefore, you may either regenerate the calibration tensorfile with more batches of the training data and recalibrate the model, or calibrate the model on a few images from the test set. This may be done using `--cal_image_dir` flag in the `export` tool. For more information, please follow the instructions in the USER GUIDE.

### A. Inference using TensorRT engine 

In [None]:
!tao detectnet_v2 inference -e $SPECS_DIR/detectnet_v2_inference_kitti_etlt.txt \
 -o $USER_EXPERIMENT_DIR/etlt_infer_testing \
 -i $DATA_DOWNLOAD_DIR/testing/image_2 \
 -k $KEY

In [None]:
# visualize the first 12 inferenced images.
OUTPUT_PATH = 'etlt_infer_testing/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 11. QAT workflow 
This section delves into the newly enabled Quantization Aware Training feature with DetectNet_v2. The workflow defined below converts a pruned model from section [5](#head-5) to enable QAT and retrain this model to while accounting the noise introduced due to quantization in the forward pass. 

### A. Convert pruned model to QAT and retrain 
All detectnet models, unpruned and pruned models can be converted to QAT models by setting the `enable_qat` parameter in the `training_config` component of the spec file to `true`.

In [35]:
# Printing the retrain experiment file. 
# Note: We have updated the experiment file to convert the
# pretrained model to qat mode by setting the enable_qat
# parameter.
!cat $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt

random_seed: 42
dataset_config {
 data_sources {
 tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_trainval/*"
 image_directory_path: "/workspace/tao-experiments/data/training/"
 }
 image_extension: "png"
 target_class_mapping{
 key:"car"
 value:"car"
 }
 validation_fold: 0
}
augmentation_config {
 preprocessing {
 output_image_width: 960
 output_image_height: 544
 min_bbox_width: 1.0
 min_bbox_height: 1.0
 output_image_channel: 3
 enable_auto_resize: true
 }
 spatial_augmentation {
 hflip_probability: 0.5
 vflip_probability: 0.0
 zoom_min: 1.0
 zoom_max: 1.0
 translate_max_x: 8.0
 translate_max_y: 8.0
 }
 color_augmentation {
 hue_rotation_max: 25.0
 saturation_shift_max: 0.20000000298
 contrast_scale_max: 0.10000000149
 contrast_center: 0.5
 }
}

postprocessing_config {
 target_class_config {
 key: "car"
 value {
 clustering_config {
 clustering_algorithm: DBSCAN
 coverage_threshold: 0.005
 dbscan_eps: 0.15
 dbscan_min_sam

In [36]:
!tao detectnet_v2 train -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
 -r $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat \
 -k $KEY \
 -n resnet18_detector_pruned_qat \
 --gpus $NUM_GPUS

2022-01-06 16:16:17,824 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-c0erns2p because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.








2022-01-06 08:16:23,959 [INFO] iva.common.logging.logging: Log file already exists at /workspace/tao-experiments/experiment/experiment_dir_retrain_qat/status.json
2022-01-06 08:16:23,959 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/specs/detectnet_v2_retrain_resnet18_kitti_qat.txt.
2022-01-06 08:16:23,961 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/specs/detectnet_v2_retrain_resnet18_kitti_qat.txt
2022-01-06 08:16:24,074 [INFO] __main__: Cannot ite

2022-01-06 08:16:42,071 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to 
input_1 (InputLayer) (None, 3, 544, 960) 0 
__________________________________________________________________________________________________
input_1_qdq (QDQ) (None, 3, 544, 960) 1 input_1[0][0] 
__________________________________________________________________________________________________
conv1 (QuantizedConv2D) (None, 64, 272, 480) 9472 input_1_qdq[0][0] 
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0] 
__________________________________________________________________________________________________
activation_1 (ReLU) (None, 64, 272, 480) 0 bn_conv1[0][0] 
________________________________________________



2022-01-06 08:16:44,420 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2022-01-06 08:16:44,626 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2022-01-06 08:16:44,631 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2022-01-06 08:16:44,631 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000


2022-01-06 08:16:44,943 [INFO] __main__: Found 761 samples in training set






Traceback (most recent call last):
 File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1607, in _create_c_op
 c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot reshape a tensor with 261120 elements to shape [8,1,4,34,60] (65280 elements) f

2022-01-06 16:16:46,566 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


In [25]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights

total 45472
-rw-r--r-- 1 guest guest 46562000 Dec 31 12:04 resnet18_detector_pruned_qat.tlt


### B. Evaluate QAT converted model 
This section evaluates a QAT enabled pruned retrained model. The mAP of this model should be comparable to that of the pruned retrained model without QAT. However, due to quantization, it is possible sometimes to see a drop in the mAP value for certain datasets.

In [38]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
 -k $KEY \
 -f tlt

2022-01-06 16:20:00,401 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-wx4hs8w2 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.

2022-01-06 08:20:06,043 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/specs/detectnet_v2_retrain_resnet18_kitti_qat.txt




















2022-01-06 08:20:08,144 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
2022-01-06 08:20:08,243 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-01-06 08:20:08,243 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo shardi

2022-01-06 08:20:08,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2022-01-06 08:20:08,530 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2022-01-06 08:20:08,530 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000


2022-01-06 08:20:08,742 [INFO] iva.detectnet_v2.evaluation.build_evaluator: Found 190 samples in validation set












__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to 
input_1 (InputLayer) (None, 3, 544, 960) 0 
__________________________________________________________________________________________________
input_1_qdq (QDQ) (None, 3, 544, 960) 1 input_1[0][0] 
__________________________________________________________________________________________________
conv1 (QuantizedConv2D) (None, 64, 272, 480) 9472 input_1_qdq[0

INFO:tensorflow:Graph was finalized.
2022-01-06 08:20:09,661 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2022-01-06 08:20:10,289 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2022-01-06 08:20:10,529 [INFO] tensorflow: Done running local_init_op.
2022-01-06 08:20:11,113 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 24, 0.00s/step
2022-01-06 08:20:17,140 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 24, 0.60s/step
2022-01-06 08:20:18,702 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 24, 0.16s/step
Matching predictions to ground truth, class 1/1.: 100%|█| 1015/1015 [00:00<00:00, 15749.92it/s]





Validation cost: 0.001126
Mean average_precision (in %): 92.8748

class name average precision (in %)
------------ --------------------------
car 92.8748

Median Inference Time: 0.016298
2022-01-06 08:20:19,454 [INFO] __main__: Evaluation complete.
Time taken to run __main__:main: 0:00:13.4

### C. Export QAT trained model to int8 
Export a QAT trained model to TensorRT parsable model. This command generates an .etlt file from the trained model and the serializes corresponding int8 scales as a TRT readable calibration cache file.

In [39]:
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.etlt
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin
!tao detectnet_v2 export \
 -m $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
 -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.etlt \
 -k $KEY \
 --data_type int8 \
 --batch_size 8 \
 --max_batch_size 16 \
 --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.trt.int8 \
 --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin \
 --verbose

2022-01-06 16:20:39,163 [INFO] root: Registry: ['nvcr.io']
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-iliiqnbj because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
Using TensorFlow backend.
2022-01-06 08:20:46,924 [INFO] iva.common.export.keras_exporter: Using input nodes: ['input_1']
2022-01-06 08:20:46,924 [INFO] iva.common.export.keras_exporter: Using output nodes: ['output_cov/Sigmoid', 'output_bbox/BiasAdd']
2022-01-06 08:21:02,948 [DEBUG] iva.common.export.keras_exporter: Saving etlt model file at: /workspace/tao-experiments/experiment/experiment_dir_final/resnet18_detector_qat.etlt.
2022-01-06 08:21:03,134 [DEBUG] modulus.export._uff: Patching keras BatchNormalization...
2022-01-06 08:21:03,134 [DEBUG] modulus.export._u

### D. Evaluate a QAT trained model using the exported TensorRT engine 
This section evaluates a QAT enabled pruned retrained model using the TensorRT int8 engine that was exported in [Section C](#head-12-3). Please note that there maybe a slight difference (~0.1-0.5%) in the mAP from [Section B](#head-12-2), oweing to some differences in the implementation of quantization in TensorRT.

*Note: The TensorRT evaluator might be slightly slower than the TAO evaluator here, because the evaluation dataloader is pinned to the CPU to avoid any clashes between TensorRT and TAO instances in the GPU. Please note that this tool was not intended and has not been developed for profiling the model. It is just a means to qualitatively analyse the model.*

*Please use native TensorRT or DeepStream for the most optimized inferences.*

In [None]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
 -m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.trt.int8 \
 -f tensorrt

### E. Inference using QAT engine 
Run inference and visualize detections on test images, using the exported TensorRT engine from [Section C](#head-12-3).

In [None]:
!tao detectnet_v2 inference -e $SPECS_DIR/detectnet_v2_inference_kitti_etlt_qat.txt \
 -o $USER_EXPERIMENT_DIR/tlt_infer_testing_qat \
 -i $DATA_DOWNLOAD_DIR/testing/image_2 \
 -k $KEY

In [None]:
# visualize the first 12 inferenced images.
OUTPUT_PATH = 'tlt_infer_testing_qat/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)