INFO:tensorflow:Saving checkpoints for step-19550.
2019-11-28 10:25:49,337 [INFO] tensorflow: Saving checkpoints for step-19550.
2019-11-28 10:25:50,819 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 97, 0.00s/step
2019-11-28 10:25:53,518 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 97, 0.27s/step
2019-11-28 10:25:56,181 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 97, 0.27s/step
2019-11-28 10:25:58,780 [INFO] iva.detectnet_v2.evaluation.evaluation: step 30 / 97, 0.26s/step
2019-11-28 10:26:01,464 [INFO] iva.detectnet_v2.evaluation.evaluation: step 40 / 97, 0.27s/step
2019-11-28 10:26:04,098 [INFO] iva.detectnet_v2.evaluation.evaluation: step 50 / 97, 0.26s/step
2019-11-28 10:26:06,764 [INFO] iva.detectnet_v2.evaluation.evaluation: step 60 / 97, 0.27s/step
2019-11-28 10:26:09,330 [INFO] iva.detectnet_v2.evaluation.evaluation: step 70 / 97, 0.26s/step
2019-11-28 10:26:11,954 [INFO] iva.detectnet_v2.evaluation.evaluation: step 80 / 97, 0.26s/step
2019-11-28 10:26:14,593 [INFO] iva.detectnet_v2.evaluation.evaluation: step 90 / 97, 0.26s/step
Epoch 50/120
=========================
Validation cost: -0.000009
Mean average_precision (in %): 0.0000
class name average precision (in %)
-------------------------- --------------------------
Cl 0
Fl 0
Ladders 0
Plat 0
Stac 0
Stalls 0
Sp 0
Median Inference Time: 0.065659
Epoch 55/120
=========================
Validation cost: -0.000009
Mean average_precision (in %): 0.0000
class name average precision (in %)
-------------------------- --------------------------
Cl 0
Fl 0
Ladders 0
Plat 0
Stac 0
Stalls 0
Sp 0
Please check the attached train config file
The images used for training have high resolution (4096 *2160)
label.txt will be like following
Fl 0.00 0.00 0 901.03808 635.71608 3158.048768 2160.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
augmentation_config {
preprocessing {
output_image_width: 768
output_image_height: 768
min_bbox_width: 2.0
min_bbox_height: 2.0
output_image_channel: 3
}
train_config.txt (9.2 KB)
Morganh
November 28, 2019, 3:19pm
2
Hi
I find several culprits.
Your label.txt is not expected. See Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation , the sum of the total number of elements per object is 15.
Do you generate tfrecord files successfully with “tlt-dataset-convert”?
Your attached training config file does not exactly match what you mentioned.
In your attachment,
output_image_width: 768
output_image_height: 768
Could you attach the correct config file?
Also, please see Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation for the setting.
I find the class name in your config file do not match your mAP class name (Cl, Fl, etc)
target_classes {
name: "xxx"
Please double check if the config file is the correct one.
Hello,
Actually mAP class file was the same in config file,( I just shorten the class name while asking this question, i meant through editing this question). And also the output_image_width: 768 output_image_height: 768 , both are same like in config file . Still I’m getting 0 value for average precision.
Note : I have edited the question . Even label.txt used for training have 15 fields. And the classes are in the same name in config file.
Morganh
November 29, 2019, 1:42am
4
Hi samjith888,
The setting of output_image_width or output_image_height inside training config file should be exactly the same resolution of your training dataset.
Your mentioned that the images used for training have resolution (4096 *2160).
But your training config file set as below. It is not expected.
augmentation_config {
preprocessing {
output_image_width: 768
output_image_height: 768
I’m getting the following error when i replace the augmentation config file with my input image resolution.
ResourceExhaustedError : OOM when allocating tensor with shape[4,64,1080,2048] and type float on /j…
Morganh
November 29, 2019, 3:47am
6
I have pasted there. Please look it there
Morganh:
Hi samjith888,
The setting of output_image_width or output_image_height inside training config file should be exactly the same resolution of your training dataset.
Your mentioned that the images used for training have resolution (4096 *2160).
But your training config file set as below. It is not expected.
augmentation_config {
preprocessing {
output_image_width: 768
output_image_height: 768
Are you sure about the output_image_width and height values have to be replaced with my original training image resolution (4096 *2160) ?
I have gone through your answer https://devtalk.nvidia.com/default/topic/1067151/transfer-learning-toolkit/understanding-parameters-of-training-config/post/5405248/#5405248 Where you have mentioned about the same field with resized image resolution.
Morganh
November 30, 2019, 12:35am
9
Yes,if you resize the original dataset to 768768, it’s correct for you to set 768 768 in the training spec.
But I saw your label file is still not changed accordingly. Its bbox needs resize too.
Morganh
November 30, 2019, 3:20am
11
OK, if you do not change label, it is a must to set 4096*2160 in training spec before training.