Please provide the following information when requesting support.
• Hardware (ubuntu18.04+rtx3060)
• Network Type (Classification)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
Configuration of the TLT Instance
dockers:
nvidia/tlt-streamanalytics:
docker_registry: nvcr.io
docker_tag: v3.0-py3
tasks:
1. augment
2. bpnet
3. classification
4. detectnet_v2
5. dssd
6. emotionnet
7. faster_rcnn
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. tlt-converter
nvidia/tlt-pytorch:
docker_registry: nvcr.io
docker_tag: v3.0-py3
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. text_classification
4. question_answering
5. token_classification
6. intent_slot_classification
7. punctuation_and_capitalization
format_version: 1.0
tlt_version: 3.0
published_date: 04/16/2021
• Training spec file(If have, please share here)
model_config {
arch: “resnet”,
n_layers: 18
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: “3,224,224”
}
train_config {
train_dataset_path: “/workspace/tlt-experiments/data/split/train”
val_dataset_path: “/workspace/tlt-experiments/data/split/val”
pretrained_model_path: “/workspace/tlt-experiments/classification/pretrained_resnet18/tlt_pretrained_classification_vresnet18/resnet_18.hdf5”
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 64
n_epochs: 80
n_workers: 16
preprocess_mode: “caffe”
enable_random_crop: True
enable_center_crop: True
label_smoothing: 0.0
mixup_alpha: 0.1
regularizer
reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005
}
learning_rate
lr_config {
step {
learning_rate: 0.006
step_size: 10
gamma: 0.1
}
}
}
eval_config {
eval_dataset_path: “/workspace/tlt-experiments/data/split/test”
model_path: “/workspace/tlt-experiments/classification/output/weights/resnet_080.tlt”
top_k: 3
batch_size: 256
n_workers: 8
enable_center_crop: True
}
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Unable to converge during training,Loss and ACC exception,the training results are as follows:
Epoch 2/80
183/183 [==============================] - 40s 216ms/step - loss: 1.8445 - acc: 0.5090 - val_loss: 1.6074 - val_acc: 0.5347
Epoch 3/80
183/183 [==============================] - 40s 217ms/step - loss: 1.7428 - acc: 0.5356 - val_loss: 1.5872 - val_acc: 0.5395
Epoch 4/80
183/183 [==============================] - 39s 212ms/step - loss: 1.7061 - acc: 0.5494 - val_loss: 1.5709 - val_acc: 0.5407
Epoch 5/80
183/183 [==============================] - 39s 212ms/step - loss: 1.6480 - acc: 0.5672 - val_loss: 1.5713 - val_acc: 0.5371
Epoch 6/80
183/183 [==============================] - 39s 212ms/step - loss: 1.6354 - acc: 0.5654 - val_loss: 1.5768 - val_acc: 0.5437
Epoch 7/80
183/183 [==============================] - 39s 212ms/step - loss: 1.6221 - acc: 0.5700 - val_loss: 1.5685 - val_acc: 0.5467
Epoch 8/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5963 - acc: 0.5726 - val_loss: 1.5776 - val_acc: 0.5389
Epoch 9/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5900 - acc: 0.5823 - val_loss: 1.5743 - val_acc: 0.5431
Epoch 10/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5759 - acc: 0.5778 - val_loss: 1.5734 - val_acc: 0.5365
Epoch 11/80
183/183 [==============================] - 39s 213ms/step - loss: 1.5728 - acc: 0.5849 - val_loss: 1.5721 - val_acc: 0.5341
Epoch 12/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5790 - acc: 0.5814 - val_loss: 1.5749 - val_acc: 0.5431
Epoch 13/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5764 - acc: 0.5892 - val_loss: 1.5766 - val_acc: 0.5383
Epoch 14/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5678 - acc: 0.5870 - val_loss: 1.5762 - val_acc: 0.5353
Epoch 15/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5642 - acc: 0.5874 - val_loss: 1.5730 - val_acc: 0.5347
Epoch 16/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5638 - acc: 0.5902 - val_loss: 1.5749 - val_acc: 0.5359
Epoch 17/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5664 - acc: 0.5857 - val_loss: 1.5765 - val_acc: 0.5353
Epoch 18/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5496 - acc: 0.5929 - val_loss: 1.5762 - val_acc: 0.5359
Epoch 19/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5704 - acc: 0.5871 - val_loss: 1.5743 - val_acc: 0.5317
Epoch 20/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5602 - acc: 0.5912 - val_loss: 1.5757 - val_acc: 0.5329
Epoch 21/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5560 - acc: 0.5916 - val_loss: 1.5763 - val_acc: 0.5389
Epoch 22/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5667 - acc: 0.5927 - val_loss: 1.5742 - val_acc: 0.5341
Epoch 23/80
183/183 [==============================] - 40s 219ms/step - loss: 1.5598 - acc: 0.5909 - val_loss: 1.5750 - val_acc: 0.5323
Epoch 24/80
183/183 [==============================] - 39s 216ms/step - loss: 1.5546 - acc: 0.5902 - val_loss: 1.5753 - val_acc: 0.5317
Epoch 25/80
183/183 [==============================] - 40s 220ms/step - loss: 1.5599 - acc: 0.5893 - val_loss: 1.5774 - val_acc: 0.5341
Epoch 26/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5607 - acc: 0.5892 - val_loss: 1.5758 - val_acc: 0.5395
Epoch 27/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5537 - acc: 0.5895 - val_loss: 1.5790 - val_acc: 0.5353
Epoch 28/80
183/183 [==============================] - 39s 213ms/step - loss: 1.5557 - acc: 0.5922 - val_loss: 1.5760 - val_acc: 0.5365
Epoch 29/80
183/183 [==============================] - 40s 219ms/step - loss: 1.5594 - acc: 0.5914 - val_loss: 1.5745 - val_acc: 0.5359
Epoch 30/80
183/183 [==============================] - 40s 221ms/step - loss: 1.5574 - acc: 0.5864 - val_loss: 1.5744 - val_acc: 0.5365
Epoch 31/80
183/183 [==============================] - 40s 218ms/step - loss: 1.5599 - acc: 0.5903 - val_loss: 1.5778 - val_acc: 0.5353
Epoch 32/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5500 - acc: 0.5898 - val_loss: 1.5735 - val_acc: 0.5329
Epoch 33/80
183/183 [==============================] - 41s 221ms/step - loss: 1.5581 - acc: 0.5959 - val_loss: 1.5749 - val_acc: 0.5377
Epoch 34/80
183/183 [==============================] - 41s 223ms/step - loss: 1.5639 - acc: 0.5896 - val_loss: 1.5746 - val_acc: 0.5371
Epoch 35/80
183/183 [==============================] - 41s 223ms/step - loss: 1.5589 - acc: 0.5886 - val_loss: 1.5747 - val_acc: 0.5347
Epoch 36/80
183/183 [==============================] - 40s 216ms/step - loss: 1.5612 - acc: 0.5844 - val_loss: 1.5755 - val_acc: 0.5359
Epoch 37/80
183/183 [==============================] - 39s 216ms/step - loss: 1.5584 - acc: 0.5892 - val_loss: 1.5758 - val_acc: 0.5335
Epoch 38/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5454 - acc: 0.5930 - val_loss: 1.5775 - val_acc: 0.5353
Epoch 39/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5492 - acc: 0.5891 - val_loss: 1.5755 - val_acc: 0.5353
Epoch 40/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5619 - acc: 0.5879 - val_loss: 1.5749 - val_acc: 0.5353
Epoch 41/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5616 - acc: 0.5899 - val_loss: 1.5776 - val_acc: 0.5317
Epoch 42/80
183/183 [==============================] - 40s 217ms/step - loss: 1.5429 - acc: 0.5959 - val_loss: 1.5783 - val_acc: 0.5365
Epoch 43/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5563 - acc: 0.5930 - val_loss: 1.5766 - val_acc: 0.5353
Epoch 44/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5532 - acc: 0.5911 - val_loss: 1.5766 - val_acc: 0.5311
Epoch 45/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5456 - acc: 0.5938 - val_loss: 1.5767 - val_acc: 0.5347
Epoch 46/80
183/183 [==============================] - 39s 213ms/step - loss: 1.5500 - acc: 0.5933 - val_loss: 1.5766 - val_acc: 0.5353
Epoch 47/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5607 - acc: 0.5850 - val_loss: 1.5755 - val_acc: 0.5365
Epoch 48/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5505 - acc: 0.5933 - val_loss: 1.5741 - val_acc: 0.5335
Epoch 49/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5621 - acc: 0.5854 - val_loss: 1.5773 - val_acc: 0.5335
Epoch 50/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5588 - acc: 0.5871 - val_loss: 1.5745 - val_acc: 0.5371
Epoch 51/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5607 - acc: 0.5884 - val_loss: 1.5773 - val_acc: 0.5371
Epoch 52/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5642 - acc: 0.5879 - val_loss: 1.5759 - val_acc: 0.5371
Epoch 53/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5572 - acc: 0.5866 - val_loss: 1.5770 - val_acc: 0.5341
Epoch 54/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5521 - acc: 0.5950 - val_loss: 1.5756 - val_acc: 0.5359
Epoch 55/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5561 - acc: 0.5847 - val_loss: 1.5769 - val_acc: 0.5353
Epoch 56/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5601 - acc: 0.5867 - val_loss: 1.5764 - val_acc: 0.5305
Epoch 57/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5590 - acc: 0.5853 - val_loss: 1.5768 - val_acc: 0.5377
Epoch 58/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5526 - acc: 0.5914 - val_loss: 1.5765 - val_acc: 0.5395
Epoch 59/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5549 - acc: 0.5878 - val_loss: 1.5774 - val_acc: 0.5365
Epoch 60/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5650 - acc: 0.5898 - val_loss: 1.5739 - val_acc: 0.5365
Epoch 61/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5663 - acc: 0.5835 - val_loss: 1.5745 - val_acc: 0.5341
Epoch 62/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5482 - acc: 0.5952 - val_loss: 1.5778 - val_acc: 0.5311
Epoch 63/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5520 - acc: 0.5973 - val_loss: 1.5778 - val_acc: 0.5317
Epoch 64/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5473 - acc: 0.5921 - val_loss: 1.5744 - val_acc: 0.5377
Epoch 65/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5621 - acc: 0.5936 - val_loss: 1.5778 - val_acc: 0.5359
Epoch 66/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5509 - acc: 0.5895 - val_loss: 1.5730 - val_acc: 0.5395
Epoch 67/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5461 - acc: 0.5904 - val_loss: 1.5771 - val_acc: 0.5347
Epoch 68/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5547 - acc: 0.5894 - val_loss: 1.5762 - val_acc: 0.5371
Epoch 69/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5613 - acc: 0.5853 - val_loss: 1.5788 - val_acc: 0.5371
Epoch 70/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5533 - acc: 0.5923 - val_loss: 1.5805 - val_acc: 0.5353
Epoch 71/80
183/183 [==============================] - 39s 216ms/step - loss: 1.5589 - acc: 0.5859 - val_loss: 1.5749 - val_acc: 0.5353
Epoch 72/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5623 - acc: 0.5914 - val_loss: 1.5748 - val_acc: 0.5359
Epoch 73/80
183/183 [==============================] - 39s 216ms/step - loss: 1.5546 - acc: 0.5887 - val_loss: 1.5747 - val_acc: 0.5389
Epoch 74/80
183/183 [==============================] - 40s 220ms/step - loss: 1.5516 - acc: 0.5872 - val_loss: 1.5777 - val_acc: 0.5329
Epoch 75/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5519 - acc: 0.5900 - val_loss: 1.5744 - val_acc: 0.5377
Epoch 76/80
183/183 [==============================] - 39s 213ms/step - loss: 1.5618 - acc: 0.5842 - val_loss: 1.5776 - val_acc: 0.5401
Epoch 77/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5574 - acc: 0.5886 - val_loss: 1.5758 - val_acc: 0.5359
Epoch 78/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5618 - acc: 0.5888 - val_loss: 1.5776 - val_acc: 0.5395
Epoch 79/80
183/183 [==============================] - 39s 214ms/step - loss: 1.5518 - acc: 0.5879 - val_loss: 1.5759 - val_acc: 0.5353
Epoch 80/80
183/183 [==============================] - 39s 215ms/step - loss: 1.5634 - acc: 0.5909 - val_loss: 1.5756 - val_acc: 0.5353
2021-08-10 09:43:14,950 [INFO] main: Total Val Loss: 1.5756317377090454
2021-08-10 09:43:14,950 [INFO] main: Total Val accuracy: 0.5353293418884277
2021-08-10 09:43:14,950 [INFO] main: Training finished successfully.
2021-08-10 17:43:16,446 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.