W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at pack_op.cc:88 : Resource exhausted: OOM when allocating tensor with shape[32,3,2160,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Yes.OOM is resolved because of the resizing of images as you suggested. Please find the updated spec file ‘detectnet_config.txt’ in the question #1

Have you generated new tfrecords files based on resized image/labels?
More, can you paste AP result of all the class?

Yes, i have generated tfrecords by using the resized images and labels.

Epoch 101/120
=========================

Validation cost: -0.000010
Mean average_precision (in %): 0.0000

class name                    average precision (in %)
--------------------------  --------------------------
Closed-column-tip                                    0
Flare-tip                                            0
Ladders                                              0
Platforms                                            0
Stack-diameter-change-zone                           0
Stack-shells                                         0
Stack-tip                                            0

Can you tell me the quantity of total images and each class’s images?

I have 1952 images.

Using TensorFlow backend.
2019-12-05 07:54:06,902 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2019-12-05 07:54:06,908 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 1562	Val: 390
2019-12-05 07:54:06,908 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2019-12-05 07:54:06,909 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0
/usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:266: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2019-12-05 07:54:06,964 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2019-12-05 07:54:07,005 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2019-12-05 07:54:07,045 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 3
2019-12-05 07:54:07,091 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 4
2019-12-05 07:54:07,135 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 5
2019-12-05 07:54:07,176 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 6
2019-12-05 07:54:07,226 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 7
2019-12-05 07:54:07,280 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 8
2019-12-05 07:54:07,323 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 9
2019-12-05 07:54:07,366 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - 
Wrote the following numbers of objects:
closed-column-tip: 61
stack-diameter-change-zone: 42
platforms: 992
stack-shells: 752
ladders: 637
stack-tip: 130
flare-tip: 27

2019-12-05 07:54:07,366 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0
2019-12-05 07:54:07,542 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 1
2019-12-05 07:54:07,714 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 2
2019-12-05 07:54:07,880 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 3
2019-12-05 07:54:08,053 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 4
2019-12-05 07:54:08,223 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 5
2019-12-05 07:54:08,395 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 6
2019-12-05 07:54:08,563 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 7
2019-12-05 07:54:08,744 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 8
2019-12-05 07:54:08,925 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9
2019-12-05 07:54:09,100 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - 
Wrote the following numbers of objects:
closed-column-tip: 177
stack-diameter-change-zone: 154
stack-tip: 555
stack-shells: 2952
ladders: 2404
platforms: 2852
flare-tip: 59

2019-12-05 07:54:09,100 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics
2019-12-05 07:54:09,100 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - 
Wrote the following numbers of objects:
closed-column-tip: 238
stack-diameter-change-zone: 196
platforms: 3844
stack-shells: 3704
ladders: 3041
stack-tip: 685
flare-tip: 86

2019-12-05 07:54:09,100 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map. 
Label in GT: Label in tfrecords file 
Closed-column-tip: closed-column-tip
Stack-diameter-change-zone: stack-diameter-change-zone
Platforms: platforms
Stack-shells: stack-shells
Ladders: ladders
Stack-tip: stack-tip
Flare-tip: flare-tip
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

2019-12-05 07:54:09,100 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Tfrecords generation complete.

Hi samjith888,
Please see line 67
“For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.”

And also https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#dataloader
The class names key in the target_class_mapping must be identical to the one shown in the dataset converter log, so that the correct classes are picked up for training.

Can you modify all the class of your spec file? For example,

target_class_mapping {
    key: "Platforms"
    value: "Platforms"
  }

change to

target_class_mapping {
    key: "platforms"
    value: "platforms"
  }

I have done it for target_class_mapping , but i got the following error…

Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 37, in main
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 632, in main
  File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 490, in train_gridbox
  File "./detectnet_v2/scripts/train.py", line 136, in run_training_loop
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 676, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: bboxes class ID out of range [0, 7[, got-1
	 [[node BboxRasterizer_1/RasterizeBbox (defined at <string>:159) ]]
	 [[node resnet18_nopool_bn_detectnet_v2/block_4b_bn_1/AssignMovingAvg (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:186) ]]

Caused by op u'BboxRasterizer_1/RasterizeBbox', defined at:
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 37, in main
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 632, in main
  File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 466, in train_gridbox
  File "./detectnet_v2/scripts/train.py", line 308, in build_training_graph
  File "./detectnet_v2/scripts/train.py", line 215, in rasterize_tensors
  File "./detectnet_v2/model/detectnet_model.py", line 557, in generate_ground_truth_tensors
  File "./detectnet_v2/objectives/objective_set.py", line 256, in generate_ground_truth_tensors
  File "./detectnet_v2/rasterizers/bbox_rasterizer.py", line 377, in rasterize_labels
  File "./modulus/processors/processors.py", line 227, in __call__
  File "./modulus/processors/bbox_rasterizer.py", line 190, in call
  File "<string>", line 159, in rasterize_bbox
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): bboxes class ID out of range [0, 7[, got-1
	 [[node BboxRasterizer_1/RasterizeBbox (defined at <string>:159) ]]
	 [[node resnet18_nopool_bn_detectnet_v2/block_4b_bn_1/AssignMovingAvg (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:186) ]]

The first letter of every class is capital in my label.txt file

Hi samjith888,
Please paste your latest spec here.

Hi samjith888,
Is your issue fixed? Can we close this topic? Thanks.

Issue not solved, so i moved into Faster_rcnn.

Please paste your latest spec here.

Your first letter of every class is capital.In tfreocrd, all the classes will be written into low case.