Please provide the following information when requesting support.
• Hardware (8 x V100)
• Network Type (action_recognition)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
tao info --verbose
dockers:
nvidia/tao/tao-toolkit:
4.0.0-tf2.9.1:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
4.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification_tf1
4. detectnet_v2
5. dssd
6. emotionnet
7. efficientdet_tf1
8. faster_rcnn
9. fpenet
10. gazenet
11. gesturenet
12. heartratenet
13. lprnet
14. mask_rcnn
15. multitask_classification
16. retinanet
17. ssd
18. unet
19. yolo_v3
20. yolo_v4
21. yolo_v4_tiny
22. converter
...
Configuration of the TAO Toolkit Instance
dockers: ['nvidia/tao/tao-toolkit']
format_version: 2.0
toolkit_version: 4.0.1
published_date: 03/06/2023
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
When I start training some of the logs are trimmed in the Jupyter output cell. For example I can’t see val_loss
in output of the cell.
Epoch 0: 98%|██████████▊| 787/800 [06:41<00:06, 1.96it/s, loss=0.664, v_num=0]Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 0: 98%|██████████▊| 788/800 [06:52<00:06, 1.91it/s, loss=0.674, v_num=0]
Validation: 0it [00:00, ?it/s]
Validation DataLoader 0: 0%| | 0/12 [00:03<?, ?it/s]
Epoch 0: 99%|██████████▊| 789/800 [06:55<00:05, 1.90it/s, loss=0.674, v_num=0]
Epoch 0: 99%|██████████▊| 790/800 [06:56<00:05, 1.90it/s, loss=0.674, v_num=0]
Epoch 0: 99%|██████████▉| 791/800 [06:56<00:04, 1.90it/s, loss=0.674, v_num=0]
Epoch 0: 99%|██████████▉| 792/800 [06:56<00:04, 1.90it/s, loss=0.674, v_num=0]
Epoch 0: 99%|██████████▉| 793/800 [06:56<00:03, 1.90it/s, loss=0.674, v_num=0]
Epoch 0: 99%|██████████▉| 794/800 [06:57<00:03, 1.90it/s, loss=0.674, v_num=0]
Epoch 0: 99%|██████████▉| 795/800 [06:57<00:02, 1.90it/s, loss=0.674, v_num=0]
Epoch 0: 100%|██████████▉| 796/800 [06:57<00:02, 1.91it/s, loss=0.674, v_num=0]
Epoch 0: 100%|██████████▉| 797/800 [06:58<00:01, 1.91it/s, loss=0.674, v_num=0]
Epoch 0: 100%|██████████▉| 798/800 [06:58<00:01, 1.91it/s, loss=0.674, v_num=0]
Epoch 0: 100%|██████████▉| 799/800 [06:58<00:00, 1.91it/s, loss=0.674, v_num=0]
Epoch 0: 100%|█| 800/800 [07:00<00:00, 1.90it/s, loss=0.674, v_num=0, val_loss=
Epoch 1: 98%|▉| 787/800 [13:23<00:13, 1.02s/it, loss=0.0738, v_num=0, val_lossAdjusting learning rate of group 0 to 1.0000e-02.
Epoch 1: 98%|▉| 788/800 [13:24<00:12, 1.02s/it, loss=0.0712, v_num=0, val_loss
Validation: 0it [00:00, ?it/s]
Validation DataLoader 0: 0%| | 0/12 [00:03<?, ?it/s]
Epoch 1: 99%|▉| 789/800 [13:28<00:11, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 99%|▉| 790/800 [13:28<00:10, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 99%|▉| 791/800 [13:28<00:09, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 99%|▉| 792/800 [13:29<00:08, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 99%|▉| 793/800 [13:29<00:07, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 99%|▉| 794/800 [13:29<00:06, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 99%|▉| 795/800 [13:29<00:05, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 100%|▉| 796/800 [13:30<00:04, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 100%|▉| 797/800 [13:30<00:03, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 100%|▉| 798/800 [13:31<00:02, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 100%|▉| 799/800 [13:31<00:01, 1.02s/it, loss=0.0712, v_num=0, val_loss
Epoch 1: 100%|█| 800/800 [13:31<00:00, 1.01s/it, loss=0.0712, v_num=0, val_loss
Epoch 2: 98%|▉| 787/800 [19:53<00:19, 1.52s/it, loss=0.000615, v_num=0, val_loAdjusting learning rate of group 0 to 1.0000e-02.
Epoch 2: 98%|▉| 788/800 [19:53<00:18, 1.52s/it, loss=0.000603, v_num=0, val_lo
Validation: 0it [00:00, ?it/s]
Validation DataLoader 0: 0%| | 0/12 [00:03<?, ?it/s]
Epoch 2: 99%|▉| 789/800 [19:57<00:16, 1.52s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 99%|▉| 790/800 [19:57<00:15, 1.52s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 99%|▉| 791/800 [19:58<00:13, 1.51s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 99%|▉| 792/800 [19:58<00:12, 1.51s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 99%|▉| 793/800 [19:58<00:10, 1.51s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 99%|▉| 794/800 [19:58<00:09, 1.51s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 99%|▉| 795/800 [19:59<00:07, 1.51s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 100%|▉| 796/800 [19:59<00:06, 1.51s/it, loss=0.000603, v_num=0, val_lo
Epoch 2: 100%|▉| 797/800 [20:00<00:04, 1.51s/it, loss=0.000603, v_num=0, val_lo