Not generating PersonREID output tlt file even after training is finished

Dear @Morganh or Team,

I am using TAO-5 and trained model using re-identification net sample.

I used market-net dataset and also include different custom data-source as well.

My training got complete after 2 weeks. But at end I am unable to see the .tlt file. can you please suggest what is the gap and where is the gap in training.

below is training configuration

results_dir: "/results"
encryption_key: nvidia_tao
model:
  backbone: resnet_50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: "/model/resnet50_pretrained.pth"
  input_channels: 3
  input_width: 128
  input_height: 256
  neck: bnneck
  feat_dim: 256
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True
dataset:
  train_dataset_dir: "/data/sample_train"
  test_dataset_dir: "/data/sample_test"
  query_dataset_dir: "/data/sample_query"
  num_classes: 100
  batch_size: 64
  val_batch_size: 128
  num_workers: 1
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  optim:
    name: Adam
    steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 1
    weight_decay: 0.0005
    weight_decay_bias: 0.0005
    warmup_factor: 0.01
    warmup_iters: 10
    warmup_method: linear
    base_lr: 0.00035
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
  num_epochs: 120
  checkpoint_interval: 10

Below is the training logs.

Train model
2024-04-03 20:11:37,331 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-04-03 20:11:37,383 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt
2024-04-03 20:11:37,436 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 267: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/smarg/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-04-03 20:11:37,436 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
sys:1: UserWarning: 
'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:107: UserWarning: 
'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Train results will be saved at: /results/market1501/train
Loading pretrained ImageNet model......
╒══════════╤═════════╤════════════╤═════════════╕
│ Subset   │   # IDs │   # Images │   # Cameras │
╞══════════╪═════════╪════════════╪═════════════╡
│ Train    │   13526 │    1820258 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Query    │     793 │       2347 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Gallery  │     100 │       1779 │           6 │
╘══════════╧═════════╧════════════╧═════════════╛
<frozen core.loggers.api_logging>:245: UserWarning: Log file already exists at /results/market1501/train/status.json
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting `Trainer(gpus=[0])` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=[0])` instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:604: UserWarning: Checkpoint directory /results/market1501/train exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "

  | Name           | Type     | Params
--------------------------------------------
0 | model          | Baseline | 27.5 M
1 | train_accuracy | Accuracy | 0     
2 | val_accuracy   | Accuracy | 0     
--------------------------------------------
27.5 M    Trainable params
256       Non-trainable params
27.5 M    Total params
109.983   Total estimated model params size (MB)
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Training: 0it [00:00, ?it/s]Starting Training Loop.
Epoch 0:   0%|                                        | 0/27176 [00:00<?, ?it/s]

Epoch 0: 100%|█████▉| 27154/27176 [2:15:05<00:06,  3.35it/s, loss=1.34, v_num=1]Train and Val metrics generated.
Epoch 0: 100%|▉| 27154/27176 [2:15:05<00:06,  3.35it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 1:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=5.430, 

Epoch 1: 100%|▉| 27174/27176 [1:41:20<00:00,  4.47it/s, loss=1.34, v_num=1, traiTrain and Val metrics generated.
Epoch 1: 100%|▉| 27174/27176 [1:41:21<00:00,  4.47it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 2:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260, 

Epoch 2: 100%|▉| 27167/27176 [1:42:21<00:02,  4.42it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 2: 100%|▉| 27167/27176 [1:42:22<00:02,  4.42it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 3:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.870, 

Epoch 3: 100%|▉| 27141/27176 [1:43:35<00:08,  4.37it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 3: 100%|▉| 27141/27176 [1:43:36<00:08,  4.37it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 4:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.730, 

Epoch 4: 100%|▉| 27126/27176 [1:44:43<00:11,  4.32it/s, loss=1.35, v_num=1, traiTrain and Val metrics generated.
Epoch 4: 100%|▉| 27126/27176 [1:44:44<00:11,  4.32it/s, loss=1.35, v_num=1, traiTraining loop in progress
Epoch 5:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.670, 

Epoch 5: 100%|▉| 27135/27176 [1:43:30<00:09,  4.37it/s, loss=1.31, v_num=1, traiTrain and Val metrics generated.
Epoch 5: 100%|▉| 27135/27176 [1:43:30<00:09,  4.37it/s, loss=1.31, v_num=1, traiTraining loop in progress
Epoch 6:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=2.630, 

Epoch 6: 100%|▉| 27161/27176 [1:43:14<00:03,  4.38it/s, loss=1.33, v_num=1, traiTrain and Val metrics generated.
Epoch 6: 100%|▉| 27161/27176 [1:43:14<00:03,  4.38it/s, loss=1.33, v_num=1, traiTraining loop in progress
Epoch 7:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.610, 

Epoch 7: 100%|█| 27176/27176 [1:44:59<00:00,  4.31it/s, loss=1.32, v_num=1, traiTrain and Val metrics generated.
Epoch 7: 100%|█| 27176/27176 [1:45:00<00:00,  4.31it/s, loss=1.32, v_num=1, traiTraining loop in progress
Epoch 8:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.600, 

Epoch 8: 100%|▉| 27135/27176 [1:48:30<00:09,  4.17it/s, loss=1.34, v_num=1, traiTrain and Val metrics generated.
Epoch 8: 100%|▉| 27135/27176 [1:48:30<00:09,  4.17it/s, loss=1.34, v_num=1, traiTraining loop in progress
Epoch 9:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.590, 

Epoch 9: 100%|▉| 27152/27176 [1:50:43<00:05,  4.09it/s, loss=1.33, v_num=1, traiTrain and Val metrics generated.
Epoch 9: 100%|▉| 27152/27176 [1:50:44<00:05,  4.09it/s, loss=1.33, v_num=1, traiTraining loop in progress
Epoch 10:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.590,

Epoch 10: 100%|▉| 27158/27176 [1:51:50<00:04,  4.05it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 10: 100%|▉| 27158/27176 [1:51:50<00:04,  4.05it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 11:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.600,

Epoch 11: 100%|▉| 27131/27176 [1:46:14<00:10,  4.26it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 11: 100%|▉| 27131/27176 [1:46:15<00:10,  4.26it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 12:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.590,

Epoch 12: 100%|█| 27176/27176 [1:48:30<00:00,  4.17it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 12: 100%|█| 27176/27176 [1:48:31<00:00,  4.17it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 13:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.580,

Epoch 13: 100%|█| 27176/27176 [1:41:17<00:00,  4.47it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 13: 100%|█| 27176/27176 [1:41:18<00:00,  4.47it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 14:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.570,

Epoch 14: 100%|▉| 27148/27176 [1:39:18<00:06,  4.56it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 14: 100%|▉| 27148/27176 [1:39:18<00:06,  4.56it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 15:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.570,

Epoch 15: 100%|▉| 27162/27176 [1:38:36<00:03,  4.59it/s, loss=1.37, v_num=1, traTrain and Val metrics generated.
Epoch 15: 100%|▉| 27162/27176 [1:38:36<00:03,  4.59it/s, loss=1.37, v_num=1, traTraining loop in progress
Epoch 16:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.37, v_num=1, train_loss=2.570,

Epoch 16: 100%|▉| 27121/27176 [1:38:32<00:11,  4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 16: 100%|▉| 27121/27176 [1:38:32<00:11,  4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 17:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.570,

Epoch 17: 100%|▉| 27129/27176 [1:38:34<00:10,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 17: 100%|▉| 27129/27176 [1:38:34<00:10,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 18:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 18: 100%|▉| 27157/27176 [1:38:38<00:04,  4.59it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 18: 100%|▉| 27157/27176 [1:38:38<00:04,  4.59it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 19:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.570,

Epoch 19: 100%|▉| 27150/27176 [1:38:38<00:05,  4.59it/s, loss=1.37, v_num=1, traTrain and Val metrics generated.
Epoch 19: 100%|▉| 27150/27176 [1:38:38<00:05,  4.59it/s, loss=1.37, v_num=1, traTraining loop in progress
Epoch 20:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.37, v_num=1, train_loss=2.570,

Epoch 20: 100%|▉| 27147/27176 [1:38:46<00:06,  4.58it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 20: 100%|▉| 27147/27176 [1:38:47<00:06,  4.58it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 21:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.560,

Epoch 21: 100%|▉| 27139/27176 [1:38:55<00:08,  4.57it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 21: 100%|▉| 27139/27176 [1:38:56<00:08,  4.57it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 22:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.570,

Epoch 22: 100%|▉| 27128/27176 [1:48:27<00:11,  4.17it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 22: 100%|▉| 27128/27176 [1:48:28<00:11,  4.17it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 23:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 23: 100%|▉| 27161/27176 [1:45:07<00:03,  4.31it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 23: 100%|▉| 27161/27176 [1:45:07<00:03,  4.31it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 24:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 24: 100%|▉| 27162/27176 [1:48:23<00:03,  4.18it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 24: 100%|▉| 27162/27176 [1:48:24<00:03,  4.18it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 25:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,

Epoch 25: 100%|▉| 27148/27176 [1:51:51<00:06,  4.04it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 25: 100%|▉| 27148/27176 [1:51:52<00:06,  4.04it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 26:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.560,

Epoch 26: 100%|▉| 27167/27176 [1:55:14<00:02,  3.93it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 26: 100%|▉| 27167/27176 [1:55:14<00:02,  3.93it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 27:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,

Epoch 27: 100%|▉| 27137/27176 [1:39:42<00:08,  4.54it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 27: 100%|▉| 27137/27176 [1:39:43<00:08,  4.54it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 28:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 28: 100%|▉| 27166/27176 [1:39:36<00:02,  4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 28: 100%|▉| 27166/27176 [1:39:36<00:02,  4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 29:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.550,

Epoch 29: 100%|▉| 27123/27176 [1:39:59<00:11,  4.52it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 29: 100%|▉| 27123/27176 [1:40:00<00:11,  4.52it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 30:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 30: 100%|▉| 27165/27176 [1:39:39<00:02,  4.54it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 30: 100%|▉| 27165/27176 [1:39:40<00:02,  4.54it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 31:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.560,

Epoch 31: 100%|▉| 27153/27176 [1:39:17<00:05,  4.56it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 31: 100%|▉| 27153/27176 [1:39:18<00:05,  4.56it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 32:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.560,

Epoch 32: 100%|▉| 27153/27176 [1:39:11<00:05,  4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 32: 100%|▉| 27153/27176 [1:39:12<00:05,  4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 33:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=2.550,

Epoch 33: 100%|█| 27176/27176 [1:39:22<00:00,  4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 33: 100%|█| 27176/27176 [1:39:23<00:00,  4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 34:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.560,

Epoch 34: 100%|▉| 27156/27176 [1:39:18<00:04,  4.56it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 34: 100%|▉| 27156/27176 [1:39:19<00:04,  4.56it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 35:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=2.550,

Epoch 35: 100%|▉| 27167/27176 [1:39:29<00:01,  4.55it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 35: 100%|▉| 27167/27176 [1:39:29<00:01,  4.55it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 36:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.550,

Epoch 36: 100%|▉| 27170/27176 [1:40:41<00:01,  4.50it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 36: 100%|▉| 27170/27176 [1:40:41<00:01,  4.50it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 37:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=2.550,

Epoch 37: 100%|▉| 27160/27176 [1:39:05<00:03,  4.57it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 37: 100%|▉| 27160/27176 [1:39:06<00:03,  4.57it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 38:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=2.550,

Epoch 38: 100%|▉| 27131/27176 [1:48:40<00:10,  4.16it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 38: 100%|▉| 27131/27176 [1:48:41<00:10,  4.16it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 39:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=2.550,

Epoch 39: 100%|▉| 27155/27176 [1:54:27<00:05,  3.95it/s, loss=1.38, v_num=1, traTrain and Val metrics generated.
Epoch 39: 100%|▉| 27155/27176 [1:54:28<00:05,  3.95it/s, loss=1.38, v_num=1, traTraining loop in progress
Epoch 40:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.38, v_num=1, train_loss=5.030,

Epoch 40: 100%|▉| 27146/27176 [1:52:45<00:07,  4.01it/s, loss=1.36, v_num=1, traTrain and Val metrics generated.
Epoch 40: 100%|▉| 27146/27176 [1:52:46<00:07,  4.01it/s, loss=1.36, v_num=1, traTraining loop in progress
Epoch 41:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.36, v_num=1, train_loss=3.870,

Epoch 41: 100%|▉| 27155/27176 [1:39:30<00:04,  4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 41: 100%|▉| 27155/27176 [1:39:30<00:04,  4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 42:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.700,

Epoch 42: 100%|█| 27176/27176 [1:38:36<00:00,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 42: 100%|█| 27176/27176 [1:38:36<00:00,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 43:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.580,

Epoch 43: 100%|█| 27176/27176 [1:38:30<00:00,  4.60it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 43: 100%|█| 27176/27176 [1:38:30<00:00,  4.60it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 44:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.510,

Epoch 44: 100%|▉| 27163/27176 [1:38:39<00:02,  4.59it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 44: 100%|▉| 27163/27176 [1:38:39<00:02,  4.59it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 45:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.450,

Epoch 45: 100%|▉| 27169/27176 [1:38:42<00:01,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 45: 100%|▉| 27169/27176 [1:38:42<00:01,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 46:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.410,

Epoch 46: 100%|▉| 27159/27176 [1:38:38<00:03,  4.59it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 46: 100%|▉| 27159/27176 [1:38:38<00:03,  4.59it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 47:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.380,

Epoch 47: 100%|▉| 27163/27176 [1:38:35<00:02,  4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 47: 100%|▉| 27163/27176 [1:38:35<00:02,  4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 48:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.360,

Epoch 48: 100%|▉| 27140/27176 [1:38:35<00:07,  4.59it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 48: 100%|▉| 27140/27176 [1:38:36<00:07,  4.59it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 49:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.350,

Epoch 49: 100%|▉| 27145/27176 [1:38:43<00:06,  4.58it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 49: 100%|▉| 27145/27176 [1:38:43<00:06,  4.58it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 50:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.330,

Epoch 50: 100%|▉| 27169/27176 [1:39:07<00:01,  4.57it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 50: 100%|▉| 27169/27176 [1:39:08<00:01,  4.57it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 51:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.320,

Epoch 51: 100%|█| 27176/27176 [1:39:11<00:00,  4.57it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 51: 100%|█| 27176/27176 [1:39:11<00:00,  4.57it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 52:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.310,

Epoch 52: 100%|▉| 27162/27176 [1:39:16<00:03,  4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 52: 100%|▉| 27162/27176 [1:39:17<00:03,  4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 53:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.320,

Epoch 53: 100%|▉| 27175/27176 [1:39:33<00:00,  4.55it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 53: 100%|▉| 27175/27176 [1:39:34<00:00,  4.55it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 54:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.300,

Epoch 54: 100%|█| 27176/27176 [1:39:28<00:00,  4.55it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 54: 100%|█| 27176/27176 [1:39:28<00:00,  4.55it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 55:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.300,

Epoch 55: 100%|▉| 27148/27176 [1:39:20<00:06,  4.55it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 55: 100%|▉| 27148/27176 [1:39:20<00:06,  4.55it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 56:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.300,

Epoch 56: 100%|▉| 27166/27176 [1:39:18<00:02,  4.56it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 56: 100%|▉| 27166/27176 [1:39:19<00:02,  4.56it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 57:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.290,

Epoch 57: 100%|▉| 27164/27176 [1:39:19<00:02,  4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 57: 100%|▉| 27164/27176 [1:39:19<00:02,  4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 58:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.290,

Epoch 58: 100%|▉| 27137/27176 [1:39:11<00:08,  4.56it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 58: 100%|▉| 27137/27176 [1:39:11<00:08,  4.56it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 59:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.290,

Epoch 59: 100%|▉| 27145/27176 [1:39:07<00:06,  4.56it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 59: 100%|▉| 27145/27176 [1:39:08<00:06,  4.56it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 60:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.280,

Epoch 60: 100%|▉| 27173/27176 [1:39:11<00:00,  4.57it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 60: 100%|▉| 27173/27176 [1:39:12<00:00,  4.57it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 61:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.280,

Epoch 61: 100%|▉| 27138/27176 [1:38:57<00:08,  4.57it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 61: 100%|▉| 27138/27176 [1:38:57<00:08,  4.57it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 62:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.280,

Epoch 62: 100%|▉| 27137/27176 [1:38:49<00:08,  4.58it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 62: 100%|▉| 27137/27176 [1:38:50<00:08,  4.58it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 63:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.280,

Epoch 63: 100%|▉| 27161/27176 [1:38:55<00:03,  4.58it/s, loss=1.31, v_num=1, traTrain and Val metrics generated.
Epoch 63: 100%|▉| 27161/27176 [1:38:55<00:03,  4.58it/s, loss=1.31, v_num=1, traTraining loop in progress
Epoch 64:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=3.280,

Epoch 64: 100%|▉| 27152/27176 [1:41:18<00:05,  4.47it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 64: 100%|▉| 27152/27176 [1:41:19<00:05,  4.47it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 65:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.280,

Epoch 65: 100%|▉| 27148/27176 [1:45:04<00:06,  4.31it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 65: 100%|▉| 27148/27176 [1:45:05<00:06,  4.31it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 66:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.270,

Epoch 66: 100%|▉| 27124/27176 [1:59:27<00:13,  3.78it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 66: 100%|▉| 27124/27176 [1:59:27<00:13,  3.78it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 67:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.270,

Epoch 67: 100%|▉| 27139/27176 [1:39:47<00:08,  4.53it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 67: 100%|▉| 27139/27176 [1:39:47<00:08,  4.53it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 68:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260,

Epoch 68: 100%|▉| 27158/27176 [1:39:53<00:03,  4.53it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 68: 100%|▉| 27158/27176 [1:39:53<00:03,  4.53it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 69:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.260,

Epoch 69: 100%|▉| 27159/27176 [2:19:26<00:05,  3.25it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 69: 100%|▉| 27159/27176 [2:19:26<00:05,  3.25it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 70:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.890,

Epoch 70: 100%|▉| 27139/27176 [1:37:45<00:07,  4.63it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 70: 100%|▉| 27139/27176 [1:37:45<00:07,  4.63it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 71:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.490,

Epoch 71: 100%|▉| 27153/27176 [1:36:59<00:04,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 71: 100%|▉| 27153/27176 [1:37:00<00:04,  4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 72:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.240,

Epoch 72: 100%|▉| 27160/27176 [1:37:01<00:03,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 72: 100%|▉| 27160/27176 [1:37:02<00:03,  4.66it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 73:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=4.070,

Epoch 73: 100%|▉| 27147/27176 [1:36:54<00:06,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 73: 100%|▉| 27147/27176 [1:36:55<00:06,  4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 74:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.950,

Epoch 74: 100%|█| 27176/27176 [1:36:52<00:00,  4.68it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 74: 100%|█| 27176/27176 [1:36:53<00:00,  4.67it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 75:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.870,

Epoch 75: 100%|▉| 27115/27176 [1:36:43<00:13,  4.67it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 75: 100%|▉| 27115/27176 [1:36:43<00:13,  4.67it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 76:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.810,

Epoch 76: 100%|▉| 27152/27176 [1:36:49<00:05,  4.67it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 76: 100%|▉| 27152/27176 [1:36:49<00:05,  4.67it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 77:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.770,

Epoch 77: 100%|▉| 27142/27176 [1:36:48<00:07,  4.67it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 77: 100%|▉| 27142/27176 [1:36:49<00:07,  4.67it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 78:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.740,

Epoch 78: 100%|▉| 27136/27176 [1:36:34<00:08,  4.68it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 78: 100%|▉| 27136/27176 [1:36:34<00:08,  4.68it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 79:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.710,

Epoch 79: 100%|▉| 27152/27176 [1:38:40<00:05,  4.59it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 79: 100%|▉| 27152/27176 [1:38:40<00:05,  4.59it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 80:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.700,

Epoch 80: 100%|▉| 27137/27176 [1:39:50<00:08,  4.53it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 80: 100%|▉| 27137/27176 [1:39:50<00:08,  4.53it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 81:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.680,

Epoch 81: 100%|▉| 27167/27176 [1:40:26<00:01,  4.51it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 81: 100%|▉| 27167/27176 [1:40:26<00:01,  4.51it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 82:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.670,

Epoch 82: 100%|█| 27176/27176 [1:39:37<00:00,  4.55it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 82: 100%|█| 27176/27176 [1:39:37<00:00,  4.55it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 83:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.660,

Epoch 83: 100%|▉| 27175/27176 [1:38:16<00:00,  4.61it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 83: 100%|▉| 27175/27176 [1:38:16<00:00,  4.61it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 84:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.650,

Epoch 84: 100%|█| 27176/27176 [1:37:47<00:00,  4.63it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 84: 100%|█| 27176/27176 [1:37:48<00:00,  4.63it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 85:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.640,

Epoch 85: 100%|▉| 27131/27176 [1:37:48<00:09,  4.62it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 85: 100%|▉| 27131/27176 [1:37:48<00:09,  4.62it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 86:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.640,

Epoch 86: 100%|▉| 27165/27176 [1:38:11<00:02,  4.61it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 86: 100%|▉| 27165/27176 [1:38:12<00:02,  4.61it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 87:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.630,

Epoch 87: 100%|▉| 27150/27176 [1:37:37<00:05,  4.64it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 87: 100%|▉| 27150/27176 [1:37:37<00:05,  4.64it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 88:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.620,

Epoch 88: 100%|▉| 27138/27176 [1:37:31<00:08,  4.64it/s, loss=1.35, v_num=1, traTrain and Val metrics generated.
Epoch 88: 100%|▉| 27138/27176 [1:37:31<00:08,  4.64it/s, loss=1.35, v_num=1, traTraining loop in progress
Epoch 89:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.620,

Epoch 89: 100%|▉| 27164/27176 [1:37:39<00:02,  4.64it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 89: 100%|▉| 27164/27176 [1:37:40<00:02,  4.64it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 90:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,

Epoch 90: 100%|▉| 27152/27176 [1:37:17<00:05,  4.65it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 90: 100%|▉| 27152/27176 [1:37:17<00:05,  4.65it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 91:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,

Epoch 91: 100%|▉| 27172/27176 [1:37:39<00:00,  4.64it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 91: 100%|▉| 27172/27176 [1:37:40<00:00,  4.64it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 92:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.610,

Epoch 92: 100%|▉| 27174/27176 [1:37:23<00:00,  4.65it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 92: 100%|▉| 27174/27176 [1:37:23<00:00,  4.65it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 93:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.600,

Epoch 93: 100%|█| 27176/27176 [1:40:02<00:00,  4.53it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 93: 100%|█| 27176/27176 [1:40:03<00:00,  4.53it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 94:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.600,

Epoch 94: 100%|▉| 27138/27176 [1:48:08<00:09,  4.18it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 94: 100%|▉| 27138/27176 [1:48:08<00:09,  4.18it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 95:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.600,

Epoch 95: 100%|█| 27176/27176 [1:51:43<00:00,  4.05it/s, loss=1.33, v_num=1, traTrain and Val metrics generated.
Epoch 95: 100%|█| 27176/27176 [1:51:43<00:00,  4.05it/s, loss=1.33, v_num=1, traTraining loop in progress
Epoch 96:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.590,

Epoch 96: 100%|▉| 27156/27176 [1:41:28<00:04,  4.46it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 96: 100%|▉| 27156/27176 [1:41:29<00:04,  4.46it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 97:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.590,

Epoch 97: 100%|█| 27176/27176 [1:42:49<00:00,  4.40it/s, loss=1.34, v_num=1, traTrain and Val metrics generated.
Epoch 97: 100%|█| 27176/27176 [1:42:50<00:00,  4.40it/s, loss=1.34, v_num=1, traTraining loop in progress
Epoch 98:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.580,

Epoch 98: 100%|▉| 27161/27176 [1:38:29<00:03,  4.60it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 98: 100%|▉| 27161/27176 [1:38:30<00:03,  4.60it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 99:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.580,

Epoch 99: 100%|▉| 27175/27176 [1:36:44<00:00,  4.68it/s, loss=1.32, v_num=1, traTrain and Val metrics generated.
Epoch 99: 100%|▉| 27175/27176 [1:36:44<00:00,  4.68it/s, loss=1.32, v_num=1, traTraining loop in progress
Epoch 100:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.580

Epoch 100: 100%|▉| 27145/27176 [1:36:35<00:06,  4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 100: 100%|▉| 27145/27176 [1:36:35<00:06,  4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 101:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.580

Epoch 101: 100%|▉| 27142/27176 [1:36:39<00:07,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 101: 100%|▉| 27142/27176 [1:36:39<00:07,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 102:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 102: 100%|▉| 27150/27176 [1:36:39<00:05,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 102: 100%|▉| 27150/27176 [1:36:39<00:05,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 103:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 103: 100%|▉| 27169/27176 [1:36:45<00:01,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 103: 100%|▉| 27169/27176 [1:36:46<00:01,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 104:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 104: 100%|▉| 27170/27176 [1:36:49<00:01,  4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 104: 100%|▉| 27170/27176 [1:36:49<00:01,  4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 105:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560

Epoch 105: 100%|▉| 27152/27176 [1:36:45<00:05,  4.68it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 105: 100%|▉| 27152/27176 [1:36:46<00:05,  4.68it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 106:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.570

Epoch 106: 100%|▉| 27152/27176 [1:36:44<00:05,  4.68it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 106: 100%|▉| 27152/27176 [1:36:44<00:05,  4.68it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 107:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560

Epoch 107: 100%|▉| 27125/27176 [1:38:34<00:11,  4.59it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 107: 100%|▉| 27125/27176 [1:38:35<00:11,  4.59it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 108:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.560

Epoch 108: 100%|▉| 27172/27176 [1:37:51<00:00,  4.63it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 108: 100%|▉| 27172/27176 [1:37:51<00:00,  4.63it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 109:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.550

Epoch 109: 100%|▉| 27162/27176 [1:37:38<00:03,  4.64it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 109: 100%|▉| 27162/27176 [1:37:39<00:03,  4.64it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 110:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.550

Epoch 110: 100%|▉| 27175/27176 [1:40:48<00:00,  4.49it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 110: 100%|▉| 27175/27176 [1:40:48<00:00,  4.49it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 111:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.540

Epoch 111: 100%|▉| 27148/27176 [1:44:13<00:06,  4.34it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 111: 100%|▉| 27148/27176 [1:44:14<00:06,  4.34it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 112:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.550

Epoch 112: 100%|▉| 27151/27176 [1:43:13<00:05,  4.38it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 112: 100%|▉| 27151/27176 [1:43:14<00:05,  4.38it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 113:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540

Epoch 113: 100%|█| 27176/27176 [1:36:15<00:00,  4.71it/s, loss=1.35, v_num=1, trTrain and Val metrics generated.
Epoch 113: 100%|█| 27176/27176 [1:36:15<00:00,  4.71it/s, loss=1.35, v_num=1, trTraining loop in progress
Epoch 114:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.35, v_num=1, train_loss=3.540

Epoch 114: 100%|▉| 27167/27176 [1:36:24<00:01,  4.70it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 114: 100%|▉| 27167/27176 [1:36:24<00:01,  4.70it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 115:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540

Epoch 115: 100%|▉| 27173/27176 [1:36:24<00:00,  4.70it/s, loss=1.34, v_num=1, trTrain and Val metrics generated.
Epoch 115: 100%|▉| 27173/27176 [1:36:24<00:00,  4.70it/s, loss=1.34, v_num=1, trTraining loop in progress
Epoch 116:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.34, v_num=1, train_loss=3.540

Epoch 116: 100%|▉| 27146/27176 [1:36:21<00:06,  4.70it/s, loss=1.33, v_num=1, trTrain and Val metrics generated.
Epoch 116: 100%|▉| 27146/27176 [1:36:21<00:06,  4.69it/s, loss=1.33, v_num=1, trTraining loop in progress
Epoch 117:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.33, v_num=1, train_loss=3.540

Epoch 117: 100%|█| 27176/27176 [1:36:37<00:00,  4.69it/s, loss=1.31, v_num=1, trTrain and Val metrics generated.
Epoch 117: 100%|█| 27176/27176 [1:36:38<00:00,  4.69it/s, loss=1.31, v_num=1, trTraining loop in progress
Epoch 118:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.31, v_num=1, train_loss=3.530

Epoch 118: 100%|▉| 27123/27176 [1:36:27<00:11,  4.69it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 118: 100%|▉| 27123/27176 [1:36:28<00:11,  4.69it/s, loss=1.32, v_num=1, trTraining loop in progress
Epoch 119:   0%| | 0/27176 [00:00<?, ?it/s, loss=1.32, v_num=1, train_loss=3.530

Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08,  4.69it/s, loss=1.32, v_num=1, trTrain and Val metrics generated.
Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08,  4.69it/s, loss=1.32, v_num=1, trTraining loop in progress
`Trainer.fit` stopped: `max_epochs=120` reached.
Epoch 119: 100%|▉| 27138/27176 [1:36:29<00:08,  4.69it/s, loss=1.32, v_num=1, tr
Training loop complete.
Training finished successfully
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: Unknown Error
Execution status: PASS
2024-04-12 06:34:08,324 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Unable to see the .tlt file in the directory.

also when I restarted the training again after the above issue then I am getting error after 1 epoch is “{“date”: “4/12/2024”, “time”: “8:51:0”, “status”: “FAILURE”, “verbosity”: “INFO”, “message”: “Error: all query identities do not appear in gallery.”}”

Please help where is the problem.

Thanks.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please check the training command line. Than double check the results folder.
Are there any checkpoint files?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.