Unable to detect object after training

Pritam · February 3, 2020, 7:06am

Hi,
I have train my model with resnet18 for single class(person). training file detectnet_v2_train_resnet18_kitti.txt is:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "person"
    value: "person"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1280
    output_image_height: 720
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/pretrained_resnet18/tlt_resnet18_detectnet_v2_v1/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: true
  activation {
    activation_type: "relu"
  }
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.699999988079
  }
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 12
  num_epochs: 3500
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}

logs after training completed :

INFO:tensorflow:Saving checkpoints for step-399000.
2020-02-02 14:23:51,160 [INFO] tensorflow: Saving checkpoints for step-399000.
2020-02-02 14:23:51,496 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 18, 0.00s/step
2020-02-02 14:23:53,434 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 18, 0.19s/step
Matching predictions to ground truth, class 1/1.: 100%|#| 319/319 [00:00<00:00, 9364.71it/s]
Epoch 3500/3500
=========================

Validation cost: 0.000086
Mean average_precision (in %): 82.3062

class name      average precision (in %)
------------  --------------------------
person                           82.3062

Median Inference Time: 0.013935
2020-02-02 14:23:55,099 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 21.359
Time taken to run iva.detectnet_v2.scripts.train:main: 1 day, 23:34:05.290584.

json file detectnet_v2_clusterfile_kitti.json for inference is :

{
    "dbscan_criterion": "IOU",
    "dbscan_eps": {
        "person": 0.25,
        "default": 0.15
    },
    "dbscan_min_samples": {
        "person": 0.05,
        "default": 0.0
    },
    "min_cov_to_cluster": {
        "person": 0.005,
        "default": 0.005
    },
    "min_obj_height": {
        "person": 4,
        "default": 2
    },
    "target_classes": ["person"],
    "confidence_th": {
        "person": 0.8
    },
    "confidence_model": {
        "person": { "kind": "aggregate_cov"},
        "default": { "kind": "aggregate_cov"}
    },
    "output_map": {
        "person" : "person"
    },
    "color": {
        "person": "green",
        "default": "blue"
    },
    "postproc_classes": ["perosn"],
    "image_height": 720,
    "image_width": 720,
    "stride": 16
}

I am not able to see even single B-BOX on test images. and also i have generated file and tested the resnet18_detector.etlt
calibration.bin
calibration.tensor
on DS-3 but not getting B-BOX on a single frames.
please help where i am wrong.
I had also train model using resnet18 last time but last time i was getting result and that time training precision was 58 but this time precision is 82 + but not getting result.

Thanks.

Morganh · February 3, 2020, 7:26am

Hi pritam,
I saw your training spec

output_image_width: 1280
 output_image_height: 720

But in your json file detectnet_v2_clusterfile_kitti.json for inference

"image_height": 720,
  "image_width": 720,

Could you plese keep align and check again?

Pritam · February 3, 2020, 7:31am

Actually morganh I had also tested on 1280720 but when i was not getting result so i was changing this with 720720 or other.

But if we keep json aside so it should work on DS but there also it is not working as not detecting any thing.

Morganh · February 3, 2020, 7:35am

Firstly, you should try “tlt-infer” command instead of DS.
You can run “tlt-infer” to see if you can get the BBOX for the images.

Pritam · February 3, 2020, 7:38am

Yes I tried that but in folder tlt_infer_testing all the images were without B-Box.

Morganh · February 3, 2020, 7:39am

Could you please check your tlt-infer command again?

Morganh · February 3, 2020, 7:42am

Also, could you please change confidence_th and try?

Pritam · February 3, 2020, 7:45am

Yes Morgan I have tried it now.

# Running inference for detection on n images
!tlt-infer detectnet_v2 -i $USER_EXPERIMENT_DIR/data/testing/image_2 \
                        -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
                        -m $USER_EXPERIMENT_DIR/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.tlt \
                        -cp $SPECS_DIR/detectnet_v2_clusterfile_kitti.json \
                        -k $KEY \
                        --kitti_dump \
                        -lw 3 \
                        -g 0 \
                        -bs 64

but getting nothing, No B-box.
Is there is problem with training or something else.

Pritam · February 3, 2020, 7:49am

Yes I changed confidence_th = 0.8 to 0.5 but same issue.

Morganh · February 3, 2020, 8:24am

Could you please change the images folder you want to do inference?
Just change below folder to the one which you used for training.

$USER_EXPERIMENT_DIR/data/testing/image_2

Pritam · February 3, 2020, 8:29am

Yes I did this also but not getting result even on the training images.

Morganh · February 3, 2020, 8:34am

Also need to check if your $USER_EXPERIMENT_DIR/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.tlt

is the exact model you just train and get 82% MAP.

Pritam · February 3, 2020, 8:44am

yes it is the same.
I had also doubt about it so I did pruning thrice and check but same issue.

Morganh · February 3, 2020, 8:47am

When you mention 82%, did you run training or retraining?

Pritam · February 3, 2020, 8:50am

It was training.

Morganh · February 3, 2020, 8:58am

OK, so please check your training log and use the exact tlt model you have trained to do inference.
By default, the tlt model should be experiment_dir_unpruned/weights/resnet18_detector.tlt

Please try with it again.

If you do tlt-prune, you will get $USER_EXPERIMENT_DIR/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.tlt .
But as mentioned in tlt user guide, it is necessary to run retraining against the pruned tlt mdoel.

Pritam · February 3, 2020, 9:51am

Ok morganh I will retrain model on pruned weight as mention in the detectnet_v2_retrain_resnet18_kitti.txt file but i have one concern is that last time when i had train my model it was giving detection result and I did not retrain the model but now not, why?? i am not getting this. if any other clue you will find then please let me know.
I am very confused.

Pritam · February 3, 2020, 10:36am

Morganh actually I had to know one thing that when the training was running I saw that on epoch 501 I was getting MAP ~88.0 but on epoch 601 I was getting MAP ~72.0 and epoch 35000 MAP was ~82.0 why ?? Means can we use the weight (model.step-41040.tlt (epoch 501 ~step 41040)) as a experiment_dir_unpruned/weights/resnet18_detector.tlt weight and prune that and test ?

Morganh · February 3, 2020, 12:58pm

Hi pritam,
You can quickly do tlt-infer against the model you already generated.
experiment_dir_unpruned/weights/resnet18_detector.tlt

It is an unpruned tlt model. And as you mentioned, it can reach 82% mAP.
During training, it is common for mAP to get fluctuated

Pritam · February 4, 2020, 7:25am

Hi morganh,
Actually I am getting good result using tlt-infer for some images but then get error like

File "/usr/local/bin/tlt-infer", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_infer.py", line 35, in main
  File "./detectnet_v2/scripts/inference.py", line 222, in main
  File "./detectnet_v2/scripts/inference.py", line 180, in inference_wrapper_batch
  File "./detectnet_v2/inferencer/tlt_inferencer.py", line 123, in infer_batch
  File "./detectnet_v2/inferencer/base_inferencer.py", line 107, in input_preprocessing
ValueError: axes don't match array
77it [01:39,  1.29s/it]

I have seen your answer from https://devtalk.nvidia.com/default/topic/1067152/transfer-learning-toolkit/valueerror-axes-don-t-match-array/post/5412235/#5412235

For batch 64 getting less number of output in tlt-infer-testing folder

# Running inference for detection on n images
!tlt-infer detectnet_v2 -i $USER_EXPERIMENT_DIR/data/training/image_2 \
                        -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
                        -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                        -cp $SPECS_DIR/detectnet_v2_clusterfile_kitti.json \
                        -k $KEY \
                        --kitti_dump \
                        -lw 3 \
                        -g 0 \
                        -bs 64

For batch 32 i am getting more output sample compare to 64 then get error mention above

# Running inference for detection on n images
!tlt-infer detectnet_v2 -i $USER_EXPERIMENT_DIR/data/training/image_2 \
                        -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
                        -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                        -cp $SPECS_DIR/detectnet_v2_clusterfile_kitti.json \
                        -k $KEY \
                        --kitti_dump \
                        -lw 3 \
                        -g 0 \
                        -bs 32

For batch 16 i am getting more output sample compare to 32 then get error mention above

# Running inference for detection on n images
!tlt-infer detectnet_v2 -i $USER_EXPERIMENT_DIR/data/training/image_2 \
                        -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
                        -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                        -cp $SPECS_DIR/detectnet_v2_clusterfile_kitti.json \
                        -k $KEY \
                        --kitti_dump \
                        -lw 3 \
                        -g 0 \
                        -bs 16

and all images are of the same format.

Topic		Replies	Views
Tlt-infer detectnet_v2 fails - TypeError TAO Toolkit	37	1731	October 12, 2021
Error while using Tlt-infer TAO Toolkit	6	772	October 12, 2021
Training with TLT a detectnet_v2 resnet18 pre-trained model failed TAO Toolkit	2	668	October 12, 2021
Label files generated by tlt-infer TAO Toolkit	10	957	October 12, 2021
Error on tlt-training detectnet_v2? TAO Toolkit	6	571	October 12, 2021
Incorrect bounding box of detectnet_v2-darknet-53 in the inference phase TAO Toolkit	10	826	October 12, 2021
Getting erroneous detection in TLT detection example. TAO Toolkit	8	1000	October 12, 2021
Detectnet_v2 trained, tao infer can infer, but no results TAO Toolkit jetson-inference	7	652	October 23, 2023
Training Custom Object detector with 6 classes TAO Toolkit	27	2416	October 12, 2021
Retraining peoplenet model with own images TAO Toolkit	43	1986	October 12, 2021

Unable to detect object after training

Related topics