Train detectnet-v2 resnet 18 on kitti dataset

Hi all,

Description

I want to train detectnet-v2 + res 18 on kitti dataset, I mounted the docker container on host system and load the kitti dataset as offline and then convert to tfrecord and then train it only one epoch. and I followed from TLT document.

Matching predictions to ground truth, class 1/3
Epoch 1/1
=========================

Validation cost: 0.000191
Mean average_precision (in %): 0.0006

class name      average precision (in %)
------------  --------------------------
car                           0.00165432
cyclist                       0
pedestrian                    0

Median Inference Time: 0.015159
INFO:tensorflow:Saving checkpoints for step-562.
2020-05-29 09:06:21,303 [INFO] tensorflow: Saving checkpoints for step-562.
Time taken to run iva.detectnet_v2.scripts.train:main: 0:06:15.723442.

Question:

1 - Loss start from 0.1 and fall down to 0.001, It’s strange for one epoch, but I don’t know it’s correct or no?
2- when I use restnet18.hdf5 for retrained, then this weight is for whole network or feature extraction only?
3- validation loss is very small, but average precision of all class is zeros, why?

Environment

TensorRT Version : 7.1.0 [Developer Preview]
GPU Type : Jetson Nano
Nvidia Driver Version : JetPack-4.4 DP (L4T R32.4.2)
CUDA Version : 10.2
CUDNN Version : 8.0.0 [Develop Preview]
Operating System + Version : Ubuntu 18.04, Linux kernel 4.9.140
Python Version (if applicable) : 3.6.9
Baremetal or Container (if container which image + tag) : Baremetal

I also trigger the same training. Totally run 1 epoch.

  1. I can see the loss will reduce gracefully. It is correct.
    0.12829292 → 0.11809954 → 0.06464981 → 0.011154257 → 0.0026569464 → 0.0019884342 → …

  2. See https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#model
    This parameter defines the path to a pretrained tlt model file. If the load_graph flag is set to False, it is assumed that only the weights of the pretrained model file is to be used. In this case, TLT train constructs the feature extractor graph in the experiment and loads the weights from the pretrained model file whose layer names match. Thus, transfer learning across different resolutions and domains are supported.

    For layers that may be absent in the pretrained model, the tool initializes them with random weights and skips import for that layer.

  3. My result is as below. The AP is not zero.

Epoch 1/1

=========================

Validation cost: 0.000119
Mean average_precision (in %): 13.3151

class name average precision (in %)


car 12.5306
cyclist 6.73058
pedestrian 20.6842

1- You trained DetectNet-v2+resnet18 on kitti with size 1248*348?
2- You resized all images before creating tfrecord?
3- I need to set flag load_graph to True if I define the path of pretrained model?

Please refer to the default jupyter notebook inside the docker.

  1. should be 1248* 384
  2. No needed to resize because the KITTI dataset are mostly 1248*384
  3. For training, it is not needed. For retraining, it is needed. See details in tlt user guide.

@Morganh, I seem to be training a custom dataset and I face the exact same problem. Any idea

@beefshepherd
For your question, let us track with your created topic Tlt detectnet training focusing on a particular class? - #5 by beefshepherd