EfficientDet training with customdata and resulting 0 mAP in evaluation:

Description

I am training the EfficientDet model using TAO and with custom DATA.

PROBLEM: I am getting 0 mAP in evaluation. While training, I am getting good results at each checkpoint. if I check after training for the last epoch then the problem came. getting zeroes as AP for all classes.

as someone suggested, I just included normalized Bbox coordinates but still, the problem is the same.

Environment

±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.119 Driver Version: 553.09 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A2000 12GB On | 00000000:01:00.0 On | Off |
| 44% 74C P2 66W / 70W | 7983MiB / 12282MiB | 49% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 312 C /python3.10 N/A |
±----------------------------------------------------------------------------------------+

INFO: Python version:
INFO: pip3 found.
INFO: Pip version: pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)
INFO: Docker found. Checking additional requirements for docker.
INFO: Checking nvidia-docker2 installation
INFO: NGC CLI found.
INFO: NGC CLI 3.54.0

Please solve my issue

i am waiting for the reply…

I am waiting for the reply

Hi @korinetharunkumarpalli, please can you share some code that shows the issue? That way we can help debug - thanks!


Please refer the images attached.

The training script clearly shows the values at checkpoints. But after training when I used the evaluation script, it did not show anything, it zeroes.

I am putting my specs file below.

Specs file for training:
dataset_convert:
image_dir: ‘/workspace/tao-experiments/data/raw-data/train2024/’
annotations_file: ‘/workspace/tao-experiments/data/raw-data/annotations/instances_train.json’
results_dir: ‘/workspace/tao-experiments/data/efficientdet_tf2/tfrecords’
tag: ‘train’
num_shards: 64
include_masks: True
dataset:
augmentation:
rand_hflip: True
random_crop_min_scale: 0.8
random_crop_max_scale: 1.0
loader:
prefetch_size: 4
shuffle_file: False
shuffle_buffer: 10000
cycle_length: 32
block_length: 16
max_instances_per_image: 100
skip_crowd_during_training: True
num_classes: 8
train_tfrecords:
- ‘/workspace/tao-experiments/data/efficientdet_tf2/tfrecords/train-
val_tfrecords:
- '/workspace/tao-experiments/data/efficientdet_tf2/tfrecords/val-

val_json_file: ‘/workspace/tao-experiments/data/raw-data/annotations/instances_val.json’
train:
optimizer:
name: ‘adam’
lr_schedule:
name: ‘cosine’
warmup_epoch: 5
warmup_init: 0.00005
learning_rate: 0.0005
amp: True
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/pretrained_efficientdet_tf2_vefficientnet_b0”
num_examples_per_epoch: 3500
moving_average_decay: 0.99
batch_size: 8
checkpoint_interval: 5
l2_weight_decay: 0.0001
l1_weight_decay: 0.0
clip_gradients_norm: 5.0
image_preview: True
qat: False
random_seed: 42
pruned_model_path: ‘’
num_epochs: 30
model:
name: ‘efficientdet-d0’
input_width: 512
input_height: 512
aspect_ratios: ‘[(1.0, 1.0), (2.0, 0.5), (0.5, 2.0)]’
anchor_scale: 3
min_level: 3
max_level: 7
num_scales: 3
freeze_bn: False
freeze_blocks:
evaluate:
batch_size: 1
num_samples: 588
max_detections_per_image: 100
label_map: “/workspace/tao-experiments/efficientdet_tf2/specs/coco_labels.yaml”
trt_engine: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.int8.engine”
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned/train/efficientdet-d0_030.tlt”
prune:
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned/train/efficientdet-d0_030.tlt”
normalizer: ‘max’
equalization_criterion: ‘union’
granularity: 1
threshold: 0.0
min_num_filters: 1000
export:
batch_size: 1
dynamic_batch_size: True
min_score_thresh: 0.25
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unprunned/train/efficientdet-d0_030.tlt”
onnx_file: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.onnx”
gen_trt_engine:
onnx_file: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.onnx”
trt_engine: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.int8.engine”
tensorrt:
data_type: “int8”
max_workspace_size: 2 # in Gb
calibration:
cal_image_dir: “/workspace/tao-experiments/data/raw-data/val2024”
cal_cache_file: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.cal”
cal_batch_size: 16
cal_batches: 10
inference:
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned/train/efficientdet-d0_030.tlt”
trt_engine: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.int8.engine”
image_dir: “/workspace/tao-experiments/data/raw-data/test_samples”
dump_label: True
batch_size: 1
min_score_thresh: 0.25
label_map: “/workspace/tao-experiments/efficientdet_tf2/specs/coco_labels.yaml”
results_dir: ‘/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned’

Specs file for retraining after pruning:

dataset:
augmentation:
rand_hflip: True
random_crop_min_scale: 0.8
random_crop_max_scale: 1.0
loader:
prefetch_size: 4
shuffle_file: False
shuffle_buffer: 10000
cycle_length: 32
block_length: 16
max_instances_per_image: 100
skip_crowd_during_training: True
num_classes: 8
train_tfrecords:
- ‘/workspace/tao-experiments/data/efficientdet_tf2/tfrecords/train-
val_tfrecords:
- '/workspace/tao-experiments/data/efficientdet_tf2/tfrecords/val-

val_json_file: ‘/workspace/tao-experiments/data/raw-data/annotations/instances_val.json’
train:
optimizer:
name: ‘adam’
lr_schedule:
name: ‘cosine’
warmup_epoch: 5
warmup_init: 0.00005
learning_rate: 0.0005
amp: True
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned/prune/model_th=0.0_eq=union.tlt”
num_examples_per_epoch: 3500
moving_average_decay: 0.99
batch_size: 8
checkpoint_interval: 5
l2_weight_decay: 0.0001
l1_weight_decay: 0.0
clip_gradients_norm: 5.0
image_preview: True
qat: False
random_seed: 42
pruned_model_path: ‘/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned/prune/model_th=0.0_eq=union.tlt’
num_epochs: 50
model:
name: ‘efficientdet-d0’
input_width: 512
input_height: 512
aspect_ratios: ‘[(1.0, 1.0), (2.0, 0.5), (0.5, 2.0)]’
anchor_scale: 3
min_level: 3
max_level: 7
num_scales: 3
freeze_bn: False
freeze_blocks:
evaluate:
batch_size: 1
num_samples: 588
max_detections_per_image: 100
label_map: “/workspace/tao-experiments/efficientdet_tf2/specs/coco_labels.yaml”
checkpoint: ‘/workspace/tao-experiments/efficientdet_tf2/experiment_dir_retrain/train/efficientdet-d0_050.tlt’
export:
batch_size: 1
dynamic_batch_size: True
min_score_thresh: 0.25
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/experiment_dir_retrain/train/efficientdet-d0_050.tlt”
onnx_file: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.onnx”
gen_trt_engine:
onnx_file: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.onnx”
trt_engine: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.int8.engine”
tensorrt:
data_type: “int8”
max_workspace_size: 2 # in Gb
calibration:
cal_image_dir: “/workspace/tao-experiments/data/raw-data/val2024”
cal_cache_file: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.cal”
cal_batch_size: 16
cal_batches: 10
inference:
checkpoint: “/workspace/tao-experiments/efficientdet_tf2/experiment_dir_retrain/train/efficientdet-d0_050.tlt”
trt_engine: “/workspace/tao-experiments/efficientdet_tf2/export/efficientdet-d0.int8.engine”
image_dir: “/workspace/tao-experiments/data/raw-data/test_samples”
dump_label: True
batch_size: 1
min_score_thresh: 0.25
label_map: “/workspace/tao-experiments/efficientdet_tf2/specs/coco_labels.yaml”
results_dir: ‘/workspace/tao-experiments/efficientdet_tf2/experiment_dir_retrain’

One more note:

I pruned the model which I trained and retrained it again.
After retraining the evaluation is working fine. You can also see the image attached after pruning.

But I want the results of the evaluation before pruning. but it is just showing zeroes.

Please reply as soon as possible.

My project deadline is very near…

I am waiting for the reply

i am still waiting for the reply

Hi @korinetharunkumarpalli ,
Apologies for delay, can you pls share your model with us along with the repro steps.

Thanks

The model name : EFFICIENTDET TF2 with EfficientNet backbone.

All steps as included in github as default. But for my dataset

Also You can check screenshots clearly in the above chat