Faster RCNN ROI issue

When training my FasterRCNN on my own custom dataset, I run into the issue of

No positive ROIs

If will then crash, on a random epoch. here is the console output:

.
.
.
TF_reshape_3_regr (TFReshape)   (1, 256, 4)          0           dense_regress[0][0]              
==================================================================================================
Total params: 42,947,507
Trainable params: 42,516,403
Non-trainable params: 431,104
__________________________________________________________________________________________________
2019-10-28 20:15:43,162 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5
2019-10-28 20:15:45,221 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Pretrained weights loaded!
2019-10-28 20:15:45,387 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: training example num: 47
2019-10-28 20:15:45,679 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Starting training
2019-10-28 20:15:45,679 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 1/12
Found 47 examples in training dataset, valid image extension isjpg, jpeg and png(case sensitive)

Compressed_class_mapping: {u'paint': 1, u'pore': 0}

Name mapping:{u'paint': u'paint', u'pore': u'pore'}

Training dataset stats(compressed via class mapping):

{u'paint': 65, u'pore': 73}

No positive ROIs.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-10-28 20:15:52,185 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-10-28 20:16:04.393678: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-10-28 20:16:05.004549: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.49G (2674714112 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-10-28 20:16:05.368621: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:05.368663: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:05.456343: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:05.456368: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:05.870316: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:05.870359: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:05.955358: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.35GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:05.955379: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.35GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:06.740765: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-10-28 20:16:06.740790: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 1/47 [..............................] - ETA: 17:20 - rpn_cls: 0.8178 - rpn_regr: 0.0731 - detector_cls: 0.5741 - detector_regr: 0.0000e+00No positive ROIs.
 2/47 [>.............................] - ETA: 8:39 - rpn_cls: 0.7998 - rpn_regr: 0.0612 - detector_cls: 0.4603 - detector_regr: 0.0000e+00 No positive ROIs.
 3/47 [>.............................] - ETA: 5:45 - rpn_cls: 0.7892 - rpn_regr: 0.0592 - detector_cls: 0.3998 - detector_regr: 0.0000e+00No positive ROIs.
 4/47 [=>............................] - ETA: 4:18 - rpn_cls: 0.7805 - rpn_regr: 0.0563 - detector_cls: 0.3576 - detector_regr: 0.0000e+00No positive ROIs.
 5/47 [==>...........................] - ETA: 3:25 - rpn_cls: 0.7711 - rpn_regr: 0.0537 - detector_cls: 0.3252 - detector_regr: 0.0000e+00No positive ROIs.
 6/47 [==>...........................] - ETA: 2:50 - rpn_cls: 0.7611 - rpn_regr: 0.0515 - detector_cls: 0.3032 - detector_regr: 0.0000e+00No positive ROIs.
 7/47 [===>..........................] - ETA: 2:25 - rpn_cls: 0.7518 - rpn_regr: 0.0500 - detector_cls: 0.2849 - detector_regr: 0.0000e+00No positive ROIs.
 8/47 [====>.........................] - ETA: 2:06 - rpn_cls: 0.7430 - rpn_regr: 0.0487 - detector_cls: 0.2689 - detector_regr: 0.0000e+00No positive ROIs.
 9/47 [====>.........................] - ETA: 1:51 - rpn_cls: 0.7344 - rpn_regr: 0.0484 - detector_cls: 0.2546 - detector_regr: 0.0000e+00No positive ROIs.
10/47 [=====>........................] - ETA: 1:39 - rpn_cls: 0.7260 - rpn_regr: 0.0481 - detector_cls: 0.2419 - detector_regr: 0.0000e+00No positive ROIs.
11/47 [======>.......................] - ETA: 1:29 - rpn_cls: 0.7176 - rpn_regr: 0.0476 - detector_cls: 0.2305 - detector_regr: 0.0000e+00No positive ROIs.
12/47 [======>.......................] - ETA: 1:20 - rpn_cls: 0.7094 - rpn_regr: 0.0471 - detector_cls: 0.2202 - detector_regr: 0.0000e+00No positive ROIs.
13/47 [=======>......................] - ETA: 1:13 - rpn_cls: 0.7015 - rpn_regr: 0.0467 - detector_cls: 0.2108 - detector_regr: 0.0000e+00No positive ROIs.
14/47 [=======>......................] - ETA: 1:07 - rpn_cls: 0.6938 - rpn_regr: 0.0463 - detector_cls: 0.2023 - detector_regr: 0.0000e+00No positive ROIs.
15/47 [========>.....................] - ETA: 1:02 - rpn_cls: 0.6861 - rpn_regr: 0.0459 - detector_cls: 0.1945 - detector_regr: 0.0000e+00No positive ROIs.
16/47 [=========>....................] - ETA: 57s - rpn_cls: 0.6785 - rpn_regr: 0.0455 - detector_cls: 0.1874 - detector_regr: 0.0000e+00 No positive ROIs.
17/47 [=========>....................] - ETA: 52s - rpn_cls: 0.6710 - rpn_regr: 0.0451 - detector_cls: 0.1808 - detector_regr: 0.0000e+00No positive ROIs.
18/47 [==========>...................] - ETA: 49s - rpn_cls: 0.6635 - rpn_regr: 0.0447 - detector_cls: 0.1747 - detector_regr: 0.0000e+00No positive ROIs.
19/47 [===========>..................] - ETA: 45s - rpn_cls: 0.6561 - rpn_regr: 0.0443 - detector_cls: 0.1691 - detector_regr: 0.0000e+00No positive ROIs.
20/47 [===========>..................] - ETA: 42s - rpn_cls: 0.6487 - rpn_regr: 0.0440 - detector_cls: 0.1639 - detector_regr: 0.0000e+00No positive ROIs.
21/47 [============>.................] - ETA: 39s - rpn_cls: 0.6414 - rpn_regr: 0.0437 - detector_cls: 0.1590 - detector_regr: 0.0000e+00No positive ROIs.
22/47 [=============>................] - ETA: 36s - rpn_cls: 0.6342 - rpn_regr: 0.0435 - detector_cls: 0.1544 - detector_regr: 0.0000e+00No positive ROIs.
23/47 [=============>................] - ETA: 34s - rpn_cls: 0.6271 - rpn_regr: 0.0433 - detector_cls: 0.1501 - detector_regr: 0.0000e+00No positive ROIs.
24/47 [==============>...............] - ETA: 31s - rpn_cls: 0.6201 - rpn_regr: 0.0431 - detector_cls: 0.1461 - detector_regr: 0.0000e+00No positive ROIs.
25/47 [==============>...............] - ETA: 29s - rpn_cls: 0.6133 - rpn_regr: 0.0430 - detector_cls: 0.1423 - detector_regr: 0.0000e+00No positive ROIs.
26/47 [===============>..............] - ETA: 27s - rpn_cls: 0.6065 - rpn_regr: 0.0428 - detector_cls: 0.1387 - detector_regr: 0.0000e+00No positive ROIs.
27/47 [================>.............] - ETA: 25s - rpn_cls: 0.5999 - rpn_regr: 0.0426 - detector_cls: 0.1353 - detector_regr: 0.0000e+00No positive ROIs.
28/47 [================>.............] - ETA: 23s - rpn_cls: 0.5933 - rpn_regr: 0.0425 - detector_cls: 0.1322 - detector_regr: 0.0000e+00No positive ROIs.
29/47 [=================>............] - ETA: 22s - rpn_cls: 0.5869 - rpn_regr: 0.0424 - detector_cls: 0.1291 - detector_regr: 0.0000e+00No positive ROIs.
30/47 [==================>...........] - ETA: 20s - rpn_cls: 0.5805 - rpn_regr: 0.0423 - detector_cls: 0.1263 - detector_regr: 0.0000e+00No positive ROIs.
31/47 [==================>...........] - ETA: 18s - rpn_cls: 0.5743 - rpn_regr: 0.0421 - detector_cls: 0.1235 - detector_regr: 0.0000e+00No positive ROIs.
32/47 [===================>..........] - ETA: 17s - rpn_cls: 0.5681 - rpn_regr: 0.0420 - detector_cls: 0.1209 - detector_regr: 0.0000e+00No positive ROIs.
33/47 [====================>.........] - ETA: 15s - rpn_cls: 0.5621 - rpn_regr: 0.0419 - detector_cls: 0.1185 - detector_regr: 0.0000e+00No positive ROIs.
34/47 [====================>.........] - ETA: 14s - rpn_cls: 0.5561 - rpn_regr: 0.0418 - detector_cls: 0.1161 - detector_regr: 0.0000e+00No positive ROIs.
35/47 [=====================>........] - ETA: 13s - rpn_cls: 0.5502 - rpn_regr: 0.0417 - detector_cls: 0.1138 - detector_regr: 0.0000e+00No positive ROIs.
36/47 [=====================>........] - ETA: 11s - rpn_cls: 0.5445 - rpn_regr: 0.0415 - detector_cls: 0.1117 - detector_regr: 0.0000e+00No positive ROIs.
37/47 [======================>.......] - ETA: 10s - rpn_cls: 0.5389 - rpn_regr: 0.0415 - detector_cls: 0.1096 - detector_regr: 0.0000e+00No positive ROIs.
38/47 [=======================>......] - ETA: 9s - rpn_cls: 0.5333 - rpn_regr: 0.0414 - detector_cls: 0.1076 - detector_regr: 0.0000e+00 No positive ROIs.
39/47 [=======================>......] - ETA: 8s - rpn_cls: 0.5279 - rpn_regr: 0.0413 - detector_cls: 0.1057 - detector_regr: 0.0000e+00No positive ROIs.
40/47 [========================>.....] - ETA: 7s - rpn_cls: 0.5226 - rpn_regr: 0.0412 - detector_cls: 0.1039 - detector_regr: 0.0000e+00No positive ROIs.
41/47 [=========================>....] - ETA: 6s - rpn_cls: 0.5174 - rpn_regr: 0.0412 - detector_cls: 0.1021 - detector_regr: 0.0000e+00No positive ROIs.
42/47 [=========================>....] - ETA: 4s - rpn_cls: 0.5123 - rpn_regr: 0.0411 - detector_cls: 0.1004 - detector_regr: 0.0000e+00No positive ROIs.
43/47 [==========================>...] - ETA: 3s - rpn_cls: 0.5073 - rpn_regr: 0.0410 - detector_cls: 0.0988 - detector_regr: 0.0000e+00No positive ROIs.
44/47 [===========================>..] - ETA: 2s - rpn_cls: 0.5024 - rpn_regr: 0.0410 - detector_cls: 0.0972 - detector_regr: 0.0000e+00No positive ROIs.
45/47 [===========================>..] - ETA: 1s - rpn_cls: 0.4976 - rpn_regr: 0.0409 - detector_cls: 0.0957 - detector_regr: 0.0000e+00No positive ROIs.
46/47 [============================>.] - ETA: 0s - rpn_cls: 0.4929 - rpn_regr: 0.0409 - detector_cls: 0.0942 - detector_regr: 0.0000e+00No positive ROIs.
47/47 [==============================] - 44s 936ms/step - rpn_cls: 0.4883 - rpn_regr: 0.0408 - detector_cls: 0.0928 - detector_regr: 0.0000e+00
2019-10-28 20:16:29,661 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Mean number of bounding boxes from RPN overlapping ground truth boxes: 0.0
2019-10-28 20:16:29,662 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Classifier accuracy for bounding boxes from RPN: 1.0
2019-10-28 20:16:29,662 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN classifier: 0.275080920098
2019-10-28 20:16:29,662 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN regression: 0.0379338169748
2019-10-28 20:16:29,662 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector classifier: 0.0275653046019
2019-10-28 20:16:29,662 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector regression: 0.0
2019-10-28 20:16:29,662 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Elapsed time: 43.9829471111
2019-10-28 20:16:29,662 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Total loss changed from inf to 0.340580041675, saving weights
2019-10-28 20:16:50,733 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 2/12
No positive ROIs.
 1/47 [..............................] - ETA: 22s - rpn_cls: 0.0448 - rpn_regr: 0.0442 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 2/47 [>.............................] - ETA: 21s - rpn_cls: 0.0463 - rpn_regr: 0.0441 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 3/47 [>.............................] - ETA: 20s - rpn_cls: 0.0457 - rpn_regr: 0.0409 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 4/47 [=>............................] - ETA: 19s - rpn_cls: 0.0449 - rpn_regr: 0.0394 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 5/47 [==>...........................] - ETA: 19s - rpn_cls: 0.0444 - rpn_regr: 0.0387 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 6/47 [==>...........................] - ETA: 18s - rpn_cls: 0.0442 - rpn_regr: 0.0380 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 7/47 [===>..........................] - ETA: 18s - rpn_cls: 0.0441 - rpn_regr: 0.0372 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 8/47 [====>.........................] - ETA: 18s - rpn_cls: 0.0438 - rpn_regr: 0.0364 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 9/47 [====>.........................] - ETA: 17s - rpn_cls: 0.0437 - rpn_regr: 0.0358 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
10/47 [=====>........................] - ETA: 17s - rpn_cls: 0.0435 - rpn_regr: 0.0355 - detector_cls: 1.3113e-07 - detector_regr: 0.0000e+00No positive ROIs.
11/47 [======>.......................] - ETA: 16s - rpn_cls: 0.0432 - rpn_regr: 0.0351 - detector_cls: 1.3990e-07 - detector_regr: 0.0000e+00No positive ROIs.
12/47 [======>.......................] - ETA: 16s - rpn_cls: 0.0428 - rpn_regr: 0.0348 - detector_cls: 5.5294e-07 - detector_regr: 0.0000e+00No positive ROIs.
13/47 [=======>......................] - ETA: 15s - rpn_cls: 0.0424 - rpn_regr: 0.0346 - detector_cls: 8.7298e-07 - detector_regr: 0.0000e+00No positive ROIs.
14/47 [=======>......................] - ETA: 15s - rpn_cls: 0.0421 - rpn_regr: 0.0344 - detector_cls: 1.1962e-06 - detector_regr: 0.0000e+00No positive ROIs.
15/47 [========>.....................] - ETA: 14s - rpn_cls: 0.0417 - rpn_regr: 0.0341 - detector_cls: 1.4529e-06 - detector_regr: 0.0000e+00No positive ROIs.
16/47 [=========>....................] - ETA: 14s - rpn_cls: 0.0413 - rpn_regr: 0.0338 - detector_cls: 1.6592e-06 - detector_regr: 0.0000e+00No positive ROIs.
17/47 [=========>....................] - ETA: 13s - rpn_cls: 0.0409 - rpn_regr: 0.0335 - detector_cls: 1.8252e-06 - detector_regr: 0.0000e+00No positive ROIs.
18/47 [==========>...................] - ETA: 13s - rpn_cls: 0.0405 - rpn_regr: 0.0333 - detector_cls: 1.9593e-06 - detector_regr: 0.0000e+00No positive ROIs.
19/47 [===========>..................] - ETA: 12s - rpn_cls: 0.0402 - rpn_regr: 0.0330 - detector_cls: 2.0678e-06 - detector_regr: 0.0000e+00No positive ROIs.
20/47 [===========>..................] - ETA: 12s - rpn_cls: 0.0399 - rpn_regr: 0.0328 - detector_cls: 2.1558e-06 - detector_regr: 0.0000e+00No positive ROIs.
21/47 [============>.................] - ETA: 12s - rpn_cls: 0.0396 - rpn_regr: 0.0326 - detector_cls: 2.2269e-06 - detector_regr: 0.0000e+00No positive ROIs.
22/47 [=============>................] - ETA: 11s - rpn_cls: 0.0394 - rpn_regr: 0.0325 - detector_cls: 2.2843e-06 - detector_regr: 0.0000e+00No positive ROIs.
23/47 [=============>................] - ETA: 11s - rpn_cls: 0.0391 - rpn_regr: 0.0323 - detector_cls: 2.3304e-06 - detector_regr: 0.0000e+00No positive ROIs.
24/47 [==============>...............] - ETA: 10s - rpn_cls: 0.0389 - rpn_regr: 0.0321 - detector_cls: 2.3670e-06 - detector_regr: 0.0000e+00No positive ROIs.
25/47 [==============>...............] - ETA: 10s - rpn_cls: 0.0386 - rpn_regr: 0.0319 - detector_cls: 2.3957e-06 - detector_regr: 0.0000e+00No positive ROIs.
26/47 [===============>..............] - ETA: 9s - rpn_cls: 0.0384 - rpn_regr: 0.0317 - detector_cls: 2.4178e-06 - detector_regr: 0.0000e+00 No positive ROIs.
27/47 [================>.............] - ETA: 9s - rpn_cls: 0.0382 - rpn_regr: 0.0316 - detector_cls: 2.4344e-06 - detector_regr: 0.0000e+00No positive ROIs.
28/47 [================>.............] - ETA: 8s - rpn_cls: 0.0380 - rpn_regr: 0.0315 - detector_cls: 2.4463e-06 - detector_regr: 0.0000e+00No positive ROIs.
29/47 [=================>............] - ETA: 8s - rpn_cls: 0.0378 - rpn_regr: 0.0314 - detector_cls: 2.4542e-06 - detector_regr: 0.0000e+00No positive ROIs.
30/47 [==================>...........] - ETA: 7s - rpn_cls: 0.0376 - rpn_regr: 0.0313 - detector_cls: 2.4588e-06 - detector_regr: 0.0000e+00No positive ROIs.
31/47 [==================>...........] - ETA: 7s - rpn_cls: 0.0375 - rpn_regr: 0.0312 - detector_cls: 2.4605e-06 - detector_regr: 0.0000e+00No positive ROIs.
32/47 [===================>..........] - ETA: 6s - rpn_cls: 0.0373 - rpn_regr: 0.0311 - detector_cls: 2.4597e-06 - detector_regr: 0.0000e+00No positive ROIs.
33/47 [====================>.........] - ETA: 6s - rpn_cls: 0.0372 - rpn_regr: 0.0310 - detector_cls: 2.4569e-06 - detector_regr: 0.0000e+00No positive ROIs.
34/47 [====================>.........] - ETA: 6s - rpn_cls: 0.0370 - rpn_regr: 0.0309 - detector_cls: 2.4523e-06 - detector_regr: 0.0000e+00No positive ROIs.
35/47 [=====================>........] - ETA: 5s - rpn_cls: 0.0369 - rpn_regr: 0.0308 - detector_cls: 2.4462e-06 - detector_regr: 0.0000e+00No positive ROIs.
36/47 [=====================>........] - ETA: 5s - rpn_cls: 0.0367 - rpn_regr: 0.0307 - detector_cls: 2.4387e-06 - detector_regr: 0.0000e+00No positive ROIs.
37/47 [======================>.......] - ETA: 4s - rpn_cls: 0.0366 - rpn_regr: 0.0306 - detector_cls: 2.4302e-06 - detector_regr: 0.0000e+00No positive ROIs.
38/47 [=======================>......] - ETA: 4s - rpn_cls: 0.0364 - rpn_regr: 0.0305 - detector_cls: 2.4207e-06 - detector_regr: 0.0000e+00No positive ROIs.
39/47 [=======================>......] - ETA: 3s - rpn_cls: 0.0363 - rpn_regr: 0.0304 - detector_cls: 2.4105e-06 - detector_regr: 0.0000e+00No positive ROIs.
40/47 [========================>.....] - ETA: 3s - rpn_cls: 0.0361 - rpn_regr: 0.0304 - detector_cls: 2.6760e-06 - detector_regr: 0.0000e+00No positive ROIs.
41/47 [=========================>....] - ETA: 2s - rpn_cls: 0.0360 - rpn_regr: 0.0303 - detector_cls: 2.9209e-06 - detector_regr: 0.0000e+00No positive ROIs.
42/47 [=========================>....] - ETA: 2s - rpn_cls: 0.0359 - rpn_regr: 0.0302 - detector_cls: 3.1470e-06 - detector_regr: 0.0000e+00No positive ROIs.
43/47 [==========================>...] - ETA: 1s - rpn_cls: 0.0358 - rpn_regr: 0.0301 - detector_cls: 3.3560e-06 - detector_regr: 0.0000e+00No positive ROIs.
44/47 [===========================>..] - ETA: 1s - rpn_cls: 0.0356 - rpn_regr: 0.0300 - detector_cls: 3.5492e-06 - detector_regr: 0.0000e+00No positive ROIs.
45/47 [===========================>..] - ETA: 0s - rpn_cls: 0.0355 - rpn_regr: 0.0300 - detector_cls: 3.7280e-06 - detector_regr: 0.0000e+00No positive ROIs.
46/47 [============================>.] - ETA: 0s - rpn_cls: 0.0354 - rpn_regr: 0.0299 - detector_cls: 3.8937e-06 - detector_regr: 0.0000e+00No positive ROIs.
47/47 [==============================] - 22s 465ms/step - rpn_cls: 0.0353 - rpn_regr: 0.0298 - detector_cls: 4.0472e-06 - detector_regr: 0.0000e+00
2019-10-28 20:17:12,582 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Mean number of bounding boxes from RPN overlapping ground truth boxes: 0.0
2019-10-28 20:17:12,582 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Classifier accuracy for bounding boxes from RPN: 1.0
2019-10-28 20:17:12,582 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN classifier: 0.029723406908
2019-10-28 20:17:12,583 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN regression: 0.0265477568981
2019-10-28 20:17:12,583 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector classifier: 1.11088755552e-05
2019-10-28 20:17:12,583 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector regression: 0.0
2019-10-28 20:17:12,583 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Elapsed time: 42.9208798409
2019-10-28 20:17:12,583 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Total loss changed from 0.340580041675 to 0.0562822726817, saving weights
2019-10-28 20:17:16,472 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 3/12
No positive ROIs.
 1/47 [..............................] - ETA: 22s - rpn_cls: 0.0206 - rpn_regr: 0.0165 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 2/47 [>.............................] - ETA: 21s - rpn_cls: 0.0227 - rpn_regr: 0.0174 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 3/47 [>.............................] - ETA: 20s - rpn_cls: 0.0218 - rpn_regr: 0.0184 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 4/47 [=>............................] - ETA: 20s - rpn_cls: 0.0219 - rpn_regr: 0.0192 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 5/47 [==>...........................] - ETA: 19s - rpn_cls: 0.0220 - rpn_regr: 0.0194 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 6/47 [==>...........................] - ETA: 19s - rpn_cls: 0.0220 - rpn_regr: 0.0194 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 7/47 [===>..........................] - ETA: 18s - rpn_cls: 0.0219 - rpn_regr: 0.0192 - detector_cls: 1.0401e-06 - detector_regr: 0.0000e+00No positive ROIs.
 8/47 [====>.........................] - ETA: 18s - rpn_cls: 0.0218 - rpn_regr: 0.0190 - detector_cls: 1.6300e-06 - detector_regr: 0.0000e+00No positive ROIs.
 9/47 [====>.........................] - ETA: 17s - rpn_cls: 0.0216 - rpn_regr: 0.0187 - detector_cls: 2.0192e-06 - detector_regr: 0.0000e+00No positive ROIs.
10/47 [=====>........................] - ETA: 17s - rpn_cls: 0.0214 - rpn_regr: 0.0187 - detector_cls: 2.2804e-06 - detector_regr: 0.0000e+00No positive ROIs.
11/47 [======>.......................] - ETA: 16s - rpn_cls: 0.0211 - rpn_regr: 0.0186 - detector_cls: 2.4568e-06 - detector_regr: 0.0000e+00No positive ROIs.
12/47 [======>.......................] - ETA: 16s - rpn_cls: 0.0209 - rpn_regr: 0.0184 - detector_cls: 2.5754e-06 - detector_regr: 0.0000e+00No positive ROIs.
13/47 [=======>......................] - ETA: 15s - rpn_cls: 0.0207 - rpn_regr: 0.0183 - detector_cls: 2.8114e-06 - detector_regr: 0.0000e+00No positive ROIs.
14/47 [=======>......................] - ETA: 15s - rpn_cls: 0.0206 - rpn_regr: 0.0181 - detector_cls: 2.9856e-06 - detector_regr: 0.0000e+00No positive ROIs.
15/47 [========>.....................] - ETA: 15s - rpn_cls: 0.0205 - rpn_regr: 0.0180 - detector_cls: 3.1137e-06 - detector_regr: 0.0000e+00No positive ROIs.
16/47 [=========>....................] - ETA: 14s - rpn_cls: 0.0205 - rpn_regr: 0.0179 - detector_cls: 3.2071e-06 - detector_regr: 0.0000e+00No positive ROIs.
17/47 [=========>....................] - ETA: 14s - rpn_cls: 0.0205 - rpn_regr: 0.0178 - detector_cls: 3.2740e-06 - detector_regr: 0.0000e+00No positive ROIs.
18/47 [==========>...................] - ETA: 13s - rpn_cls: 0.0205 - rpn_regr: 0.0178 - detector_cls: 3.3204e-06 - detector_regr: 0.0000e+00No positive ROIs.
19/47 [===========>..................] - ETA: 13s - rpn_cls: 0.0205 - rpn_regr: 0.0177 - detector_cls: 3.3509e-06 - detector_regr: 0.0000e+00No positive ROIs.
20/47 [===========>..................] - ETA: 12s - rpn_cls: 0.0204 - rpn_regr: 0.0176 - detector_cls: 3.3689e-06 - detector_regr: 0.0000e+00No positive ROIs.
21/47 [============>.................] - ETA: 12s - rpn_cls: 0.0204 - rpn_regr: 0.0176 - detector_cls: 3.3770e-06 - detector_regr: 0.0000e+00No positive ROIs.
22/47 [=============>................] - ETA: 11s - rpn_cls: 0.0203 - rpn_regr: 0.0176 - detector_cls: 3.3773e-06 - detector_regr: 0.0000e+00No positive ROIs.
23/47 [=============>................] - ETA: 11s - rpn_cls: 0.0202 - rpn_regr: 0.0175 - detector_cls: 3.3714e-06 - detector_regr: 0.0000e+00No positive ROIs.
24/47 [==============>...............] - ETA: 10s - rpn_cls: 0.0201 - rpn_regr: 0.0175 - detector_cls: 3.3606e-06 - detector_regr: 0.0000e+00No positive ROIs.
25/47 [==============>...............] - ETA: 10s - rpn_cls: 0.0201 - rpn_regr: 0.0175 - detector_cls: 3.3459e-06 - detector_regr: 0.0000e+00No positive ROIs.
26/47 [===============>..............] - ETA: 10s - rpn_cls: 0.0200 - rpn_regr: 0.0175 - detector_cls: 3.3280e-06 - detector_regr: 0.0000e+00No positive ROIs.
27/47 [================>.............] - ETA: 9s - rpn_cls: 0.0200 - rpn_regr: 0.0175 - detector_cls: 3.3077e-06 - detector_regr: 0.0000e+00 No positive ROIs.
28/47 [================>.............] - ETA: 9s - rpn_cls: 0.0200 - rpn_regr: 0.0175 - detector_cls: 3.2861e-06 - detector_regr: 0.0000e+00No positive ROIs.
29/47 [=================>............] - ETA: 8s - rpn_cls: 0.0199 - rpn_regr: 0.0175 - detector_cls: 3.2628e-06 - detector_regr: 0.0000e+00No positive ROIs.
30/47 [==================>...........] - ETA: 8s - rpn_cls: 0.0199 - rpn_regr: 0.0175 - detector_cls: 3.2384e-06 - detector_regr: 0.0000e+00No positive ROIs.
31/47 [==================>...........] - ETA: 7s - rpn_cls: 0.0198 - rpn_regr: 0.0176 - detector_cls: 3.2130e-06 - detector_regr: 0.0000e+00No positive ROIs.
32/47 [===================>..........] - ETA: 7s - rpn_cls: 0.0198 - rpn_regr: 0.0176 - detector_cls: 3.1869e-06 - detector_regr: 0.0000e+00No positive ROIs.
33/47 [====================>.........] - ETA: 6s - rpn_cls: 0.0197 - rpn_regr: 0.0176 - detector_cls: 3.1604e-06 - detector_regr: 0.0000e+00No positive ROIs.
34/47 [====================>.........] - ETA: 6s - rpn_cls: 0.0197 - rpn_regr: 0.0176 - detector_cls: 3.1335e-06 - detector_regr: 0.0000e+00No positive ROIs.
35/47 [=====================>........] - ETA: 5s - rpn_cls: 0.0197 - rpn_regr: 0.0176 - detector_cls: 3.1064e-06 - detector_regr: 0.0000e+00No positive ROIs.
36/47 [=====================>........] - ETA: 5s - rpn_cls: 0.0197 - rpn_regr: 0.0176 - detector_cls: 3.0792e-06 - detector_regr: 0.0000e+00No positive ROIs.
37/47 [======================>.......] - ETA: 4s - rpn_cls: 0.0196 - rpn_regr: 0.0176 - detector_cls: 3.0520e-06 - detector_regr: 0.0000e+00No positive ROIs.
38/47 [=======================>......] - ETA: 4s - rpn_cls: 0.0196 - rpn_regr: 0.0176 - detector_cls: 3.3027e-06 - detector_regr: 0.0000e+00No positive ROIs.
39/47 [=======================>......] - ETA: 3s - rpn_cls: 0.0196 - rpn_regr: 0.0177 - detector_cls: 3.5323e-06 - detector_regr: 0.0000e+00No positive ROIs.
40/47 [========================>.....] - ETA: 3s - rpn_cls: 0.0195 - rpn_regr: 0.0177 - detector_cls: 3.7429e-06 - detector_regr: 0.0000e+00No positive ROIs.
41/47 [=========================>....] - ETA: 2s - rpn_cls: 0.0195 - rpn_regr: 0.0177 - detector_cls: 3.9362e-06 - detector_regr: 0.0000e+00No positive ROIs.
42/47 [=========================>....] - ETA: 2s - rpn_cls: 0.0195 - rpn_regr: 0.0177 - detector_cls: 4.1137e-06 - detector_regr: 0.0000e+00No positive ROIs.
43/47 [==========================>...] - ETA: 1s - rpn_cls: 0.0194 - rpn_regr: 0.0177 - detector_cls: 4.2769e-06 - detector_regr: 0.0000e+00No positive ROIs.
44/47 [===========================>..] - ETA: 1s - rpn_cls: 0.0194 - rpn_regr: 0.0177 - detector_cls: 4.4270e-06 - detector_regr: 0.0000e+00No positive ROIs.
45/47 [===========================>..] - ETA: 0s - rpn_cls: 0.0194 - rpn_regr: 0.0177 - detector_cls: 4.5650e-06 - detector_regr: 0.0000e+00No positive ROIs.
46/47 [============================>.] - ETA: 0s - rpn_cls: 0.0193 - rpn_regr: 0.0178 - detector_cls: 4.6921e-06 - detector_regr: 0.0000e+00No positive ROIs.
47/47 [==============================] - 23s 480ms/step - rpn_cls: 0.0193 - rpn_regr: 0.0178 - detector_cls: 4.8092e-06 - detector_regr: 0.0000e+00
2019-10-28 20:17:39,033 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Mean number of bounding boxes from RPN overlapping ground truth boxes: 0.0
2019-10-28 20:17:39,033 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Classifier accuracy for bounding boxes from RPN: 1.0
2019-10-28 20:17:39,033 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN classifier: 0.0184327360698
2019-10-28 20:17:39,033 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN regression: 0.0185074357098
2019-10-28 20:17:39,034 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector classifier: 1.01928689521e-05
2019-10-28 20:17:39,034 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector regression: 0.0
2019-10-28 20:17:39,034 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Elapsed time: 26.4510099888
2019-10-28 20:17:39,034 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Total loss changed from 0.0562822726817 to 0.0369503646486, saving weights
2019-10-28 20:17:42,882 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 4/12
No positive ROIs.
 1/47 [..............................] - ETA: 21s - rpn_cls: 0.0166 - rpn_regr: 0.0110 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 2/47 [>.............................] - ETA: 20s - rpn_cls: 0.0146 - rpn_regr: 0.0104 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 3/47 [>.............................] - ETA: 20s - rpn_cls: 0.0138 - rpn_regr: 0.0115 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 4/47 [=>............................] - ETA: 20s - rpn_cls: 0.0142 - rpn_regr: 0.0134 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 5/47 [==>...........................] - ETA: 19s - rpn_cls: 0.0143 - rpn_regr: 0.0142 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 6/47 [==>...........................] - ETA: 19s - rpn_cls: 0.0145 - rpn_regr: 0.0144 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00No positive ROIs.
 7/47 [===>..........................] - ETA: 18s - rpn_cls: 0.0146 - rpn_regr: 0.0144 - detector_cls: 1.6520e-06 - detector_regr: 0.0000e+00No positive ROIs.
 8/47 [====>.........................] - ETA: 18s - rpn_cls: 0.0149 - rpn_regr: 0.0147 - detector_cls: 2.6339e-06 - detector_regr: 0.0000e+00No positive ROIs.
 9/47 [====>.........................] - ETA: 17s - rpn_cls: 0.0151 - rpn_regr: 0.0149 - detector_cls: 3.2817e-06 - detector_regr: 0.0000e+00No positive ROIs.
10/47 [=====>........................] - ETA: 17s - rpn_cls: 0.0152 - rpn_regr: 0.0152 - detector_cls: 3.7165e-06 - detector_regr: 0.0000e+00No positive ROIs.
11/47 [======>.......................] - ETA: 16s - rpn_cls: 0.0152 - rpn_regr: 0.0153 - detector_cls: 4.0102e-06 - detector_regr: 0.0000e+00No positive ROIs.
12/47 [======>.......................] - ETA: 16s - rpn_cls: 0.0152 - rpn_regr: 0.0154 - detector_cls: 4.2075e-06 - detector_regr: 0.0000e+00No positive ROIs.
13/47 [=======>......................] - ETA: 16s - rpn_cls: 0.0154 - rpn_regr: 0.0156 - detector_cls: 4.3374e-06 - detector_regr: 0.0000e+00No positive ROIs.
14/47 [=======>......................] - ETA: 15s - rpn_cls: 0.0154 - rpn_regr: 0.0157 - detector_cls: 4.4193e-06 - detector_regr: 0.0000e+00No positive ROIs.
15/47 [========>.....................] - ETA: 15s - rpn_cls: 0.0154 - rpn_regr: 0.0158 - detector_cls: 4.4664e-06 - detector_regr: 0.0000e+00No positive ROIs.
16/47 [=========>....................] - ETA: 14s - rpn_cls: 0.0153 - rpn_regr: 0.0159 - detector_cls: 4.4881e-06 - detector_regr: 0.0000e+00No positive ROIs.
17/47 [=========>....................] - ETA: 14s - rpn_cls: 0.0152 - rpn_regr: 0.0160 - detector_cls: 4.4910e-06 - detector_regr: 0.0000e+00No positive ROIs.
18/47 [==========>...................] - ETA: 13s - rpn_cls: 0.0152 - rpn_regr: 0.0160 - detector_cls: 4.4799e-06 - detector_regr: 0.0000e+00No positive ROIs.
19/47 [===========>..................] - ETA: 13s - rpn_cls: 0.0151 - rpn_regr: 0.0160 - detector_cls: 4.4584e-06 - detector_regr: 0.0000e+00No positive ROIs.
20/47 [===========>..................] - ETA: 12s - rpn_cls: 0.0150 - rpn_regr: 0.0160 - detector_cls: 4.4292e-06 - detector_regr: 0.0000e+00No positive ROIs.
21/47 [============>.................] - ETA: 12s - rpn_cls: 0.0150 - rpn_regr: 0.0160 - detector_cls: 4.3943e-06 - detector_regr: 0.0000e+00No positive ROIs.
22/47 [=============>................] - ETA: 11s - rpn_cls: 0.0149 - rpn_regr: 0.0160 - detector_cls: 4.3552e-06 - detector_regr: 0.0000e+00No positive ROIs.
23/47 [=============>................] - ETA: 11s - rpn_cls: 0.0148 - rpn_regr: 0.0160 - detector_cls: 4.3130e-06 - detector_regr: 0.0000e+00No positive ROIs.
24/47 [==============>...............] - ETA: 10s - rpn_cls: 0.0148 - rpn_regr: 0.0159 - detector_cls: 4.2686e-06 - detector_regr: 0.0000e+00No positive ROIs.
25/47 [==============>...............] - ETA: 10s - rpn_cls: 0.0147 - rpn_regr: 0.0159 - detector_cls: 4.2228e-06 - detector_regr: 0.0000e+00No positive ROIs.
26/47 [===============>..............] - ETA: 9s - rpn_cls: 0.0147 - rpn_regr: 0.0159 - detector_cls: 4.1761e-06 - detector_regr: 0.0000e+00 No positive ROIs.
27/47 [================>.............] - ETA: 9s - rpn_cls: 0.0147 - rpn_regr: 0.0159 - detector_cls: 4.1288e-06 - detector_regr: 0.0000e+00Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 287, in main
  File "./faster_rcnn/utils/roi_helpers.py", line 76, in calc_iou_np
IndexError: index 1 is out of bounds for axis 1 with size 1

Please let me know if this is a new issue or something I have not configured properly.

Here is my spec:

random_seed: 42
enc_key: 'API'
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
    size_min {
min:700
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
    image_scaling_factor: 1.0
}
feature_extractor: "resnet:50"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
images_dir : '/workspace/tlt-experiments/data/cam/training_images'
labels_dir: '/workspace/tlt-experiments/data/cam/training_labels'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 12
class_mapping {
key: 'pore'
value: 0
}
class_mapping {
key: "paint"
value: 1
}

pretrained_weights: "/workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5"
pretrained_model: ""
output_weights: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tltw"
output_model: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}

optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}

lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: '/workspace/tlt-experiments/data/cam/inference'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch12.tlt'
detection_image_output_dir: '/workspace/tlt-experiments/data/cam/inference_results_imgs'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/inference_results_imgs'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
dataset {
images_dir : '/workspace/tlt-experiments/data/cam/validation_images'
labels_dir: '/workspace/tlt-experiments/data/cam/validation_labels'
}
data_parser: 'raw_kitti'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch12.tlt'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/validation_images'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Here is a sample annotation:

pore 0.00 0 0.00 326.00 479.00 338.00 490.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
paint 0.00 0 0.00 267.00 465.00 312.00 544.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Thank you in advance for your help!

Hi martin,
I find a culprit that you are missing rpn_nms_max_boxes between line 109 and line 110 of your spec file.

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
<b>rpn_nms_max_boxes: 2000</b>
rpn_nms_overlap_threshold: 0.7

Hi martin,
Please consider below as the root cause. The class_mapping should always have a background field.

See tlt doc 5.3 section,
For FasterRCNN, the class that mapped to the largest number is always the ‘background’ due to the implementation. Also, if you want to ignore some classes in the dataset, simply map them to -1. In the previous example, their 5 classes: ‘Car’, ‘Van’, ‘Person’, ‘Cyclist’, ‘Truck’ in the dataset. You want to group ‘Car’ and ‘Van’, so map them to 0. You also want to exclude ‘Truck’, so map Truck into -1. Finally, add a dummy ‘background’ class that is mapped to the largest number(3).

Hi Morgan,

Thank you for the help. These changes have helped me train a model successfully.

Few questions:

  1. I still get the “No positive ROIs”. So I am not sure how that affects the result since I do not understand what ROIs it refers too.
2019-10-30 17:49:36,579 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 8/200
No positive ROIs.
 1/47 [..............................] - ETA: 28s - rpn_cls: 0.0088 - rpn_regr: 0.0102 - detector_cls: 1.1672e-04 - detector_regr: 0.0000e+00No positive ROIs.
 6/47 [==>...........................] - ETA: 25s - rpn_cls: 0.0130 - rpn_regr: 0.0113 - detector_cls: 0.0056 - detector_regr: 0.0069No positive ROIs.
 9/47 [====>.........................] - ETA: 23s - rpn_cls: 0.0122 - rpn_regr: 0.0110 - detector_cls: 0.0067 - detector_regr: 0.0080No positive ROIs.
16/47 [=========>....................] - ETA: 19s - rpn_cls: 0.0109 - rpn_regr: 0.0106 - detector_cls: 0.0090 - detector_regr: 0.0101No positive ROIs.
18/47 [==========>...................] - ETA: 17s - rpn_cls: 0.0107 - rpn_regr: 0.0106 - detector_cls: 0.0094 - detector_regr: 0.0104No positive ROIs.
20/47 [===========>..................] - ETA: 16s - rpn_cls: 0.0105 - rpn_regr: 0.0106 - detector_cls: 0.0096 - detector_regr: 0.0106No positive ROIs.
23/47 [=============>................] - ETA: 14s - rpn_cls: 0.0103 - rpn_regr: 0.0106 - detector_cls: 0.0098 - detector_regr: 0.0109No positive ROIs.
25/47 [==============>...............] - ETA: 13s - rpn_cls: 0.0102 - rpn_regr: 0.0106 - detector_cls: 0.0099 - detector_regr: 0.0111No positive ROIs.
27/47 [================>.............] - ETA: 12s - rpn_cls: 0.0101 - rpn_regr: 0.0106 - detector_cls: 0.0100 - detector_regr: 0.0111No positive ROIs.
34/47 [====================>.........] - ETA: 7s - rpn_cls: 0.0097 - rpn_regr: 0.0105 - detector_cls: 0.0100 - detector_regr: 0.0112No positive ROIs.
41/47 [=========================>....] - ETA: 3s - rpn_cls: 0.0094 - rpn_regr: 0.0104 - detector_cls: 0.0101 - detector_regr: 0.0113No positive ROIs.
42/47 [=========================>....] - ETA: 3s - rpn_cls: 0.0094 - rpn_regr: 0.0104 - detector_cls: 0.0101 - detector_regr: 0.0113No positive ROIs.
43/47 [==========================>...] - ETA: 2s - rpn_cls: 0.0094 - rpn_regr: 0.0104 - detector_cls: 0.0102 - detector_regr: 0.0113No positive ROIs.
47/47 [==============================] - 29s 613ms/step - rpn_cls: 0.0093 - rpn_regr: 0.0103 - detector_cls: 0.0103 - detector_regr: 0.0114
2019-10-30 17:50:05,407 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Mean number of bounding boxes from RPN overlapping ground truth boxes: 1.51063829787
2019-10-30 17:50:05,407 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Classifier accuracy for bounding boxes from RPN: 0.995013297872
2019-10-30 17:50:05,407 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN classifier: 0.00845463235585
2019-10-30 17:50:05,407 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN regression: 0.00978827798997
2019-10-30 17:50:05,407 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector classifier: 0.0122700189432
2019-10-30 17:50:05,407 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector regression: 0.0122018686045
2019-10-30 17:50:05,408 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Elapsed time: 32.6928730011
2019-10-30 17:50:05,408 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Total loss changed from 0.0417879491276 to 0.0427147978935, saving weights
  1. Is there a way we can see a graph regarding the training of the model? I have experimented with various epochs to guess when over-fitting may happen but it would be much easier to understand how the learning rate is affecting the effectiveness of each epoch.

  2. I saw in another forum thread you talked about ways to improve results on small objects for detectnet. Is there any tips and tricks I can use for FasterRCNN to detect smaller objects? So far training for 200 epochs helps a lot, but I am wondering if there are any other config paramters that I am overlooking.

  3. When using the Jetson Xavier, first I will export this model with my host computer and then use the TLT converter on the jetson to create a TensorRT engine specific for the hardware. Then to use the model I was under the assumption that I could either use Deepstream or TensorRT6. Since FasterRCNN with Deepstream is a beta feature at the moment, how can I load the model and run inference with python and tensorrt?
    Can I use the TensorRT6 python API to load the newly export exported TLT model with my Xavier? (https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/coreConcepts.html)

IF I cannot do that, and have to use deepstream, can you point me in the direction of how to make an app on top of deepstream? I have a lot of custom python code that needs to wrap around the deepstream pipeline. Using deepstream will be fine, as long as I can take the results it provides me and use them.

Please point me to the right direction on this.

Thank you in advance!

Hi Martin,
For 1) It means no positive RoI for training the RCNN in this image. Just for debug, user can feel free to ignore it. It is common to occur at the start of training. When the training converges, this will occur less frequently.
For 2), the learning rate value is not available in the log. See https://devtalk.nvidia.com/default/topic/1065795/transfer-learning-toolkit/learning-rate-monitoring/ too.
For 3) To improve accuracy on small objects, the most common trick is to use a smaller set of anchors. The anchor sizes should have a size that is similar to the small objects’ size. Anchor ratios can be kept unchanged.
For 4) See tlt doc’s section 11.
A DeepStream sample with documentation on how to run inference using the trained FasterRCNN models from TLT is provided on github at: https://github.com/NVIDIA-AI-IOT/deepstream_4.x_apps.
Also refer to https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/

Hello Morgan,

I have tested out the new smaller set of anchors with decent success.

Class AP precision recall

paint 0.9333 1.0000 0.9333

pore 0.3464 0.6364 0.4118

mAP = 0.6399

When I prune that same model and then retrain it here are the results:

================================================================================
Class AP precision recall

paint 0.4370 0.3333 0.5333

pore 0.0000 0.0000 0.0000

mAP = 0.2185

Why is it completely killing the pore class? The Pores are smaller, so it is weighting them less during the pruning?

For reference:
I used the following command: Is there more customized metrics to use for FasterRCNN?

tlt-prune -pm /workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch100.tlt \
              -o /workspace/tlt-experiments/faster_rcnn/pruned \
              -eq union \
              -pth 0.7 -k $KEY

Using this commands gets the model the following prune ratio:
Pruning ratio (pruned model / original model): 0.116337931277

Thank you for all your hard work!
Martin

Hi Martin,
Could you please attach your latest spec for retraining?

Sure thing:

Here is the first training spec:

random_seed: 42
enc_key: 'API'
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
    size_min {
min:700
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
    image_scaling_factor: 1.0
}
feature_extractor: "resnet:50"
anchor_box_config {
scale: 8.0
scale: 16.0
scale: 32.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
images_dir : '/workspace/tlt-experiments/data/cam/training_images'
labels_dir: '/workspace/tlt-experiments/data/cam/training_labels'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 100
class_mapping {
key: 'pore'
value: 0
}
class_mapping {
key: "paint"
value: 1
}
class_mapping {
key: "background"
value: 2
}
pretrained_weights: ""
pretrained_model: "/workspace/tlt-experiments/faster_rcnn/pruned/model_2_pruned.tlt"
output_weights: "/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.tltw"
output_model: "/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}

optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}

lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: '/workspace/tlt-experiments/data/cam/inference'
model: '/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt'
detection_image_output_dir: '/workspace/tlt-experiments/data/cam/retrained/inference_results_imgs_retrain'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/retrained/inference_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
dataset {
images_dir : '/workspace/tlt-experiments/data/cam/validation_images'
labels_dir: '/workspace/tlt-experiments/data/cam/validation_labels'
}
data_parser: 'raw_kitti'
model: '/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/retrained/test_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Here is the re-training spec:

random_seed: 42
enc_key: 'API'
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
    size_min {
min:700
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
    image_scaling_factor: 1.0
}
feature_extractor: "resnet:18"
anchor_box_config {
scale: 8.0
scale: 16.0
scale: 32.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
images_dir : '/workspace/tlt-experiments/data/cam/training_images'
labels_dir: '/workspace/tlt-experiments/data/cam/training_labels'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 100
class_mapping {
key: 'pore'
value: 0
}
class_mapping {
key: "paint"
value: 1
}
class_mapping {
key: "background"
value: 2
}
pretrained_weights: ""
pretrained_model: "/workspace/tlt-experiments/faster_rcnn/pruned/model_2_pruned.tlt"
output_weights: "/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.tltw"
output_model: "/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}

optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}

lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: '/workspace/tlt-experiments/data/cam/inference'
model: '/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt'
detection_image_output_dir: '/workspace/tlt-experiments/data/cam/retrained/inference_results_imgs_retrain'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/retrained/inference_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
dataset {
images_dir : '/workspace/tlt-experiments/data/cam/validation_images'
labels_dir: '/workspace/tlt-experiments/data/cam/validation_labels'
}
data_parser: 'raw_kitti'
model: '/workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/retrained/test_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Hi Martin,

  1. As the notebook mentions,
    Usually, you just need to adjust -pth (threshold) for accuracy and model size trade off. Higher pth gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is depend on the dataset. A pth value 0.5 is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.
  2. I go through your train spec and retrain spec.
    • The feature extractor does not match.
      In train spec file,
feature_extractor: "resnet:50"
 In retrain spec file, 
feature_extractor: "resnet:18"
  • In train spec file, why do you set
pretrained_model: "/workspace/tlt-experiments/faster_rcnn/pruned/model_2_pruned.tlt"

Hi Morgan,

  1. I will experiment with multiple thresholds. I guess .7 is too much…?

  2. I just retrained with the modified resnet:50, and still the same result.

ALso with the re-train spec file, that model_2_pruned is the pruned model I obtrained with the folloiwing command:

tlt-prune -pm /workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch100.tlt \
              -o /workspace/tlt-experiments/faster_rcnn/pruned \
              -eq union \
              -pth 0.7 -k $KEY

Hi Martin,

  1. Yes, more experiments are necessary before settling down the pth. Because accuracy and model size are trade off. If the retrain accuracy is good, you can increase pth value to get smaller models. Otherwise, lower this value to get better accuracy.

  2. Can you tell me the exact train spec when you get mAP 0.6399?
    Why do you set pre-trained model to your model_2_pruned model? I’m asking this because it should be TLT pre-trained model in default train spec.

Hi Morgan,

  1. I will certainly experiment. Thanks for the heads up.

  2. I apologize. I accidentally pasted the same re-train spec twice. Here is the initial train spec before the pruning.

random_seed: 42
enc_key: 'API'
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
    size_min {
min:700
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
    image_scaling_factor: 1.0
}
feature_extractor: "resnet:50"
anchor_box_config {
scale: 8.0
scale: 16.0
scale: 32.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
images_dir : '/workspace/tlt-experiments/data/cam/training_images'
labels_dir: '/workspace/tlt-experiments/data/cam/training_labels'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 100
class_mapping {
key: 'pore'
value: 0
}
class_mapping {
key: "paint"
value: 1
}
class_mapping {
key: "background"
value: 2
}

pretrained_weights: "/workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5"
pretrained_model: ""
output_weights: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tltw"
output_model: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}

optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}

lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir : '/workspace/tlt-experiments/data/cam/training_images'
#images_dir: '/workspace/tlt-experiments/data/cam/inference'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch100.tlt'
detection_image_output_dir: '/workspace/tlt-experiments/data/cam/inference_results_imgs'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/inference_results_imgs'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
dataset {
images_dir : '/workspace/tlt-experiments/data/cam/validation_images'
labels_dir: '/workspace/tlt-experiments/data/cam/validation_labels'
}
data_parser: 'raw_kitti'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch100.tlt'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/validation_images'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Hi Morgan,

I reduced the pth and got better results. Still lost quite a bit of accuracy but I think I can solve this by adding in more quality data.

I am now trying to export this FasterRCNN model using the following command:

tlt-export /workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt \
          -o /workspace/tlt-experiments/faster_rcnn/exported/frcnn_kitti_retrain.int8.etlt \
          --outputs dense_class/Softmax,dense_regress/BiasAdd,proposal \
          -e /workspace/tlt-experiments/faster_rcnn/specs/frcnn_kitti_retrain_spec.txt \
          --enc_key API \
          --input_dims 3,700,800 \
          --export_module faster_rcnn \
          --cal_image_dir /workspace/tlt-experiments/data/cam/calibration_images \
          --data_type int8 \
          --cal_batch_size 8 \
          --batches 10 \
          --generate_tensorfile \
          --cal_cache_file /workspace/tlt-experiments/faster_rcnn/exported/cal.bin

This crashes. Here is the console output:

Using TensorFlow backend.
2019-11-19 20:46:08,209 [INFO] iva.common.magnet_export: Loading model from /workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt
2019-11-19 20:46:08.209636: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-19 20:46:08.255664: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-19 20:46:08.256021: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6b41fc0 executing computations on platform CUDA. Devices:
2019-11-19 20:46:08.256038: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2019-11-19 20:46:08.274303: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-11-19 20:46:08.276714: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6bacb90 executing computations on platform Host. Devices:
2019-11-19 20:46:08.276793: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-19 20:46:08.277121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
totalMemory: 7.79GiB freeMemory: 7.24GiB
2019-11-19 20:46:08.277193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-19 20:46:08.435703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-19 20:46:08.435734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-19 20:46:08.435741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-19 20:46:08.435812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6980 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-19 20:46:10,268 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-19 20:46:20,288 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/spec_loader/spec_loader.pyc: Loading experiment spec at /workspace/tlt-experiments/faster_rcnn/specs/frcnn_kitti_retrain_spec.txt.
2019-11-19 20:46:33.218880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-19 20:46:33.218932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-19 20:46:33.218938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-19 20:46:33.218942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-19 20:46:33.219047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6980 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2019-11-19 20:46:35,675 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-19 20:46:37,168 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-19 20:46:37.834746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-19 20:46:37.834798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-19 20:46:37.834804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-19 20:46:37.834808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-19 20:46:37.834897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6980 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from /tmp/tmpEgFSVC.ckpt
2019-11-19 20:46:38,080 [INFO] tensorflow: Restoring parameters from /tmp/tmpEgFSVC.ckpt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
2019-11-19 20:46:38,599 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2019-11-19 20:46:38,599 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
INFO:tensorflow:Froze 400 variables.
2019-11-19 20:46:38,799 [INFO] tensorflow: Froze 400 variables.
INFO:tensorflow:Converted 400 variables to const ops.
2019-11-19 20:46:38,865 [INFO] tensorflow: Converted 400 variables to const ops.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: Proposal yet.
Converting proposal as custom op: Proposal
DEBUG: convert reshape to flatten node
Warning: No conversion function registered for layer: CropAndResize yet.
Converting roi_pooling_conv_1/CropAndResize_new as custom op: CropAndResize
2019-11-19 20:46:40,631 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/spec_loader/spec_loader.pyc: Loading experiment spec at /workspace/tlt-experiments/faster_rcnn/specs/frcnn_kitti_retrain_spec.txt.
10it [00:12,  1.23s/it]
2019-11-19 20:46:53,020 [INFO] iva.common.magnet_export: Calibrating the exported model. Please don't panic as this may take a while.
#assertion/home/TRT/plugin/proposalPlugin/proposalPlugin.cpp,374
Aborted (core dumped)

Perhaps I am not using the exporter correct… Maybe I am not allocating proper memory or my batch size is too big? Do you mind looking this over and perhaps giving me pointers to solve this? Also please note, I am attempting to export this model with the same machine that trained the model (contains a 2080 gpu)

Thank you!

Hi martin,
Faster_rcnn depends on the cropAndResize Plugin and proposal Plugin.
Please compile a new libnvinfer_plugin.so.5.x.x and replace the original one(i.e, /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.5.x.x)
Refer to https://devtalk.nvidia.com/default/topic/1066456/transfer-learning-toolkit/deepstream-inference-on-tx1-using-faster-rcnn-resnet-18-trained-using-tlt/post/5402162/#5402162

See tlt doc section 11 for more info.

FasterRCNN requires two TensorRT plugins to run. They are the cropAndResizePlugin and the proposalPlugin. Currently, these plugins are not included in the TensorRT 5.1GA (5.1.5.0) installation package, but they can be obtained from the TensorRT Open Source Software (OSS) in GitHub and checkout the branch release/5.1. Please follow the installation guide here, compile the open sourced plugins, and replace the libnvinfer_plugin.* in the installation directory with the one built from TensorRT OSS.

Hi Morgan,

I understand. The only thing I do not get is how to do this since I am using the TLT docker container. My current desktop does not have an aarch64-linux-gnu folder. I am not trying to export for Jetson yet, only testing on a regular GPU for now.

How does one recompile and then modify the docker container with these files? I figured these files would have been included in the latest image.

Please let me know what you think is the best setup for exporting these models.

Thanks!

Sorry, martin. I misunderstand your error log. Actually you run into error via “tlt-export” instead of deployment.

To narrow down the issue, could you please check if export in FP32 mode successfully?
The error you reported is from exporting int8 mode.

Hi Morgan,

Converting to FP32 works.

Command:

tlt-export /workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt \
          -o /workspace/tlt-experiments/faster_rcnn/exported/frcnn_kitti_retrain.int32.etlt \
          --outputs dense_class/Softmax,dense_regress/BiasAdd,proposal \
          -e /workspace/tlt-experiments/faster_rcnn/specs/frcnn_kitti_retrain_spec.txt \
          --enc_key API \
          --input_dims 3,700,800 \
          --export_module faster_rcnn \
          --data_type fp32 \
          --batches 10 \
          --generate_tensorfile

Console:

Using TensorFlow backend.
2019-11-21 13:04:44,174 [INFO] iva.common.magnet_export: Loading model from /workspace/tlt-experiments/faster_rcnn/pruned/retrain/frcnn_kitti_retrain.epoch100.tlt
2019-11-21 13:04:44.174906: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-21 13:04:44.217883: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-21 13:04:44.218222: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6250480 executing computations on platform CUDA. Devices:
2019-11-21 13:04:44.218239: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2019-11-21 13:04:44.238103: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-11-21 13:04:44.239121: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x62bafe0 executing computations on platform Host. Devices:
2019-11-21 13:04:44.239135: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-21 13:04:44.239249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
totalMemory: 7.79GiB freeMemory: 7.25GiB
2019-11-21 13:04:44.239261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-21 13:04:44.388811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-21 13:04:44.388836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-21 13:04:44.388842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-21 13:04:44.388907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6988 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-21 13:04:45,940 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-21 13:04:55,604 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/spec_loader/spec_loader.pyc: Loading experiment spec at /workspace/tlt-experiments/faster_rcnn/specs/frcnn_kitti_retrain_spec.txt.
2019-11-21 13:05:08.090853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-21 13:05:08.090905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-21 13:05:08.090912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-21 13:05:08.090917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-21 13:05:08.091023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6988 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2019-11-21 13:05:10,472 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-21 13:05:11,973 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-21 13:05:12.617492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-21 13:05:12.617543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-21 13:05:12.617549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-21 13:05:12.617553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-21 13:05:12.617644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6988 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from /tmp/tmpE1pheX.ckpt
2019-11-21 13:05:12,863 [INFO] tensorflow: Restoring parameters from /tmp/tmpE1pheX.ckpt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
2019-11-21 13:05:13,346 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2019-11-21 13:05:13,346 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
INFO:tensorflow:Froze 400 variables.
2019-11-21 13:05:13,544 [INFO] tensorflow: Froze 400 variables.
INFO:tensorflow:Converted 400 variables to const ops.
2019-11-21 13:05:13,613 [INFO] tensorflow: Converted 400 variables to const ops.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: Proposal yet.
Converting proposal as custom op: Proposal
DEBUG: convert reshape to flatten node
Warning: No conversion function registered for layer: CropAndResize yet.
Converting roi_pooling_conv_1/CropAndResize_new as custom op: CropAndResize
2019-11-21 13:05:15,221 [INFO] iva.common.magnet_export: Converted model was saved into /workspace/tlt-experiments/faster_rcnn/exported/frcnn_kitti_retrain.int32.etlt
2019-11-21 13:05:15,221 [INFO] iva.common.magnet_export: Input node: input_1
2019-11-21 13:05:15,221 [INFO] iva.common.magnet_export: Output node(s): ['dense_class/Softmax', 'dense_regress/BiasAdd', 'proposal']

But I cannot convert that fp32.etlt model to a tensorRT engine to run on my workstation.
command:

tlt-converter -k API \
              -o dense_class/Softmax,dense_regress/BiasAdd,proposal \
              -d 3,700,800 \
              -e /workspace/tlt-experiments/faster_rcnn/exported/frcnn_res50_cam.engine \
              /workspace/tlt-experiments/faster_rcnn/exported/frcnn_kitti_retrain.int32.etlt

console: ( I can only give you the final lines of the console because its too large to post.

.
.
.
[INFO] UFFParser: parsing rpn_conv1/Relu
[INFO] UFFParser: Applying order forwarding to: rpn_conv1/Relu
[INFO] UFFParser: parsing rpn_out_class/kernel
[INFO] UFFParser: Applying order forwarding to: rpn_out_class/kernel
[INFO] UFFParser: parsing rpn_out_class/convolution
[INFO] UFFParser: Applying order forwarding to: rpn_out_class/convolution
[INFO] UFFParser: parsing rpn_out_class/bias
[INFO] UFFParser: Applying order forwarding to: rpn_out_class/bias
[INFO] UFFParser: parsing rpn_out_class/BiasAdd
[INFO] UFFParser: Applying order forwarding to: rpn_out_class/BiasAdd
[INFO] UFFParser: parsing rpn_out_class/Sigmoid
[INFO] UFFParser: Applying order forwarding to: rpn_out_class/Sigmoid
[INFO] UFFParser: parsing rpn_out_regress/kernel
[INFO] UFFParser: Applying order forwarding to: rpn_out_regress/kernel
[INFO] UFFParser: parsing rpn_out_regress/convolution
[INFO] UFFParser: Applying order forwarding to: rpn_out_regress/convolution
[INFO] UFFParser: parsing rpn_out_regress/bias
[INFO] UFFParser: Applying order forwarding to: rpn_out_regress/bias
[INFO] UFFParser: parsing rpn_out_regress/BiasAdd
[INFO] UFFParser: Applying order forwarding to: rpn_out_regress/BiasAdd
[INFO] UFFParser: parsing proposal
#assertion/home/TRT/plugin/proposalPlugin/proposalPlugin.cpp,380
Aborted (core dumped)

Hi martin,
From your tlt-export log, I assume you were training with width 800 and height 700.
But according to tlt doc, the requirement of Faster_rcnn is:

FasterRCNN
 •input size: C * W * H (where C = 1 or 3; W > =480; H >=272 and W, H are multiples of 32)

Is it a potential culprit?

Oh shoot! This is probably the culprit. Good catch!

My training config had this:

size_min {
min:700
}

I will have to change this to be multiples of 32. Something like…

size_height_width {
height: 704
width: 800
}

Then I will reprune, retrain, and then re-export.

Please stay tuned. Thank you so much for your quick response!

Hi Morgan,

I have confirmed all issues above with exporting and converting to int8 have been solved with the image size. Multiples of 32 have allowed me to do it all.

I now have 2 options:

  1. Use the procedure to use the models on the Jetson
  2. Use my new tensorRT engine on the desktop PC.

If I want to address option 2), deepstream is the only way to utilize this model correct? Or is there a way to use tensorRT to simply use the same model?

Thank you for all your help with this process!