Re-Identification training got stop automatically

Dear @Morganh

I am trying to train on person re-identification model on my custom dataset on 2080TI GPU machine and my dataset size is around 2.4M. but I am getting below issue. (Training got stop automatically after 1 epoch)

Below is the configuration and training logs.

Logs:

Train model
2024-08-20 18:18:36,260 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-08-20 18:18:36,364 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt
2024-08-20 18:18:36,506 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 267: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/smarg/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-08-20 18:18:36,506 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
sys:1: UserWarning: 
'experiment_market1501_resnet.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:107: UserWarning: 
'experiment_market1501_resnet.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Train results will be saved at: /results/market1501/train
Loading pretrained ImageNet model......
╒══════════╤═════════╤════════════╤═════════════╕
│ Subset   │   # IDs │   # Images │   # Cameras │
╞══════════╪═════════╪════════════╪═════════════╡
│ Train    │   70493 │    2232198 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Query    │    8416 │      19865 │           2 │
├──────────┼─────────┼────────────┼─────────────┤
│ Gallery  │   23146 │      76627 │           2 │
╘══════════╧═════════╧════════════╧═════════════╛
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting `Trainer(gpus=[0])` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=[0])` instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: /results/market1501/train/lightning_logs
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:604: UserWarning: Checkpoint directory /results/market1501/train exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "

  | Name           | Type     | Params
--------------------------------------------
0 | model          | Baseline | 42.1 M
1 | train_accuracy | Accuracy | 0     
2 | val_accuracy   | Accuracy | 0     
--------------------------------------------
42.1 M    Trainable params
256       Non-trainable params
42.1 M    Total params
168.317   Total estimated model params size (MB)
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Training: 0it [00:00, ?it/s]Starting Training Loop.
Epoch 0:   0%|                                        | 0/33901 [00:00<?, ?it/s]

Epoch 0:  98%|█████▊| 33147/33901 [2:02:53<02:47,  4.50it/s, loss=1.51, v_num=0]
Validation: 0it [00:00, ?it/s]
Validation:   0%|                                       | 0/754 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                          | 0/754 [00:00<?, ?it/s]
Epoch 0:  98%|█████▊| 33148/33901 [2:02:54<02:47,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33149/33901 [2:02:54<02:47,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33150/33901 [2:02:54<02:47,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33151/33901 [2:02:54<02:46,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33152/33901 [2:02:55<02:46,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33153/33901 [2:02:55<02:46,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33154/33901 [2:02:55<02:46,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33155/33901 [2:02:55<02:45,  4.50it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33156/33901 [2:02:56<02:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33157/33901 [2:02:56<02:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33158/33901 [2:02:56<02:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33159/33901 [2:02:56<02:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33160/33901 [2:02:57<02:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33161/33901 [2:02:57<02:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33162/33901 [2:02:57<02:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33163/33901 [2:02:57<02:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33164/33901 [2:02:58<02:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33165/33901 [2:02:58<02:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33166/33901 [2:02:58<02:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33167/33901 [2:02:59<02:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33168/33901 [2:02:59<02:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33169/33901 [2:02:59<02:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33170/33901 [2:02:59<02:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33171/33901 [2:03:00<02:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33172/33901 [2:03:00<02:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33173/33901 [2:03:00<02:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33174/33901 [2:03:00<02:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33175/33901 [2:03:01<02:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33176/33901 [2:03:01<02:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33177/33901 [2:03:01<02:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33178/33901 [2:03:01<02:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33179/33901 [2:03:02<02:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33180/33901 [2:03:02<02:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33181/33901 [2:03:02<02:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33182/33901 [2:03:02<02:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33183/33901 [2:03:03<02:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33184/33901 [2:03:03<02:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33185/33901 [2:03:03<02:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33186/33901 [2:03:03<02:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33187/33901 [2:03:04<02:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33188/33901 [2:03:04<02:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33189/33901 [2:03:04<02:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33190/33901 [2:03:04<02:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33191/33901 [2:03:05<02:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33192/33901 [2:03:05<02:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33193/33901 [2:03:05<02:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▊| 33194/33901 [2:03:05<02:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33195/33901 [2:03:06<02:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33196/33901 [2:03:06<02:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33197/33901 [2:03:06<02:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33198/33901 [2:03:06<02:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33199/33901 [2:03:07<02:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33200/33901 [2:03:07<02:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33201/33901 [2:03:07<02:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33202/33901 [2:03:07<02:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33203/33901 [2:03:08<02:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33204/33901 [2:03:08<02:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33205/33901 [2:03:08<02:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33206/33901 [2:03:08<02:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33207/33901 [2:03:09<02:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33208/33901 [2:03:09<02:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33209/33901 [2:03:09<02:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33210/33901 [2:03:10<02:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33211/33901 [2:03:10<02:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33212/33901 [2:03:10<02:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33213/33901 [2:03:10<02:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33214/33901 [2:03:11<02:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33215/33901 [2:03:11<02:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33216/33901 [2:03:11<02:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33217/33901 [2:03:11<02:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33218/33901 [2:03:12<02:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33219/33901 [2:03:12<02:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33220/33901 [2:03:12<02:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33221/33901 [2:03:12<02:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33222/33901 [2:03:13<02:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33223/33901 [2:03:13<02:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33224/33901 [2:03:13<02:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33225/33901 [2:03:13<02:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33226/33901 [2:03:14<02:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33227/33901 [2:03:14<02:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33228/33901 [2:03:14<02:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33229/33901 [2:03:14<02:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33230/33901 [2:03:15<02:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33231/33901 [2:03:15<02:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33232/33901 [2:03:15<02:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33233/33901 [2:03:15<02:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33234/33901 [2:03:16<02:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33235/33901 [2:03:16<02:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33236/33901 [2:03:16<02:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33237/33901 [2:03:16<02:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33238/33901 [2:03:17<02:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33239/33901 [2:03:17<02:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33240/33901 [2:03:17<02:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33241/33901 [2:03:17<02:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33242/33901 [2:03:18<02:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33243/33901 [2:03:18<02:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33244/33901 [2:03:18<02:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33245/33901 [2:03:18<02:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33246/33901 [2:03:19<02:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33247/33901 [2:03:19<02:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33248/33901 [2:03:19<02:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33249/33901 [2:03:19<02:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33250/33901 [2:03:20<02:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33251/33901 [2:03:20<02:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33252/33901 [2:03:20<02:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33253/33901 [2:03:21<02:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33254/33901 [2:03:21<02:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33255/33901 [2:03:21<02:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33256/33901 [2:03:21<02:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33257/33901 [2:03:22<02:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33258/33901 [2:03:22<02:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33259/33901 [2:03:22<02:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33260/33901 [2:03:22<02:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33261/33901 [2:03:23<02:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33262/33901 [2:03:23<02:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33263/33901 [2:03:23<02:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33264/33901 [2:03:23<02:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33265/33901 [2:03:24<02:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33266/33901 [2:03:24<02:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33267/33901 [2:03:24<02:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33268/33901 [2:03:24<02:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33269/33901 [2:03:25<02:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33270/33901 [2:03:25<02:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33271/33901 [2:03:25<02:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33272/33901 [2:03:25<02:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33273/33901 [2:03:26<02:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33274/33901 [2:03:26<02:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33275/33901 [2:03:26<02:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33276/33901 [2:03:26<02:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33277/33901 [2:03:27<02:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33278/33901 [2:03:27<02:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33279/33901 [2:03:27<02:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33280/33901 [2:03:27<02:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33281/33901 [2:03:28<02:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33282/33901 [2:03:28<02:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33283/33901 [2:03:28<02:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33284/33901 [2:03:29<02:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33285/33901 [2:03:29<02:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33286/33901 [2:03:29<02:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33287/33901 [2:03:29<02:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33288/33901 [2:03:30<02:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33289/33901 [2:03:30<02:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33290/33901 [2:03:30<02:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33291/33901 [2:03:30<02:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33292/33901 [2:03:31<02:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33293/33901 [2:03:31<02:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33294/33901 [2:03:31<02:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33295/33901 [2:03:32<02:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33296/33901 [2:03:32<02:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33297/33901 [2:03:32<02:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33298/33901 [2:03:32<02:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33299/33901 [2:03:33<02:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33300/33901 [2:03:33<02:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33301/33901 [2:03:33<02:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33302/33901 [2:03:33<02:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33303/33901 [2:03:34<02:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33304/33901 [2:03:34<02:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33305/33901 [2:03:34<02:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33306/33901 [2:03:35<02:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33307/33901 [2:03:35<02:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33308/33901 [2:03:35<02:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33309/33901 [2:03:35<02:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33310/33901 [2:03:36<02:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33311/33901 [2:03:36<02:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33312/33901 [2:03:36<02:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33313/33901 [2:03:37<02:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33314/33901 [2:03:37<02:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33315/33901 [2:03:37<02:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33316/33901 [2:03:37<02:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33317/33901 [2:03:38<02:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33318/33901 [2:03:38<02:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33319/33901 [2:03:38<02:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33320/33901 [2:03:38<02:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33321/33901 [2:03:39<02:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33322/33901 [2:03:39<02:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33323/33901 [2:03:39<02:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33324/33901 [2:03:40<02:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33325/33901 [2:03:40<02:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33326/33901 [2:03:40<02:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33327/33901 [2:03:40<02:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33328/33901 [2:03:41<02:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33329/33901 [2:03:41<02:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33330/33901 [2:03:41<02:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33331/33901 [2:03:41<02:06,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33332/33901 [2:03:42<02:06,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33333/33901 [2:03:42<02:06,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33334/33901 [2:03:42<02:06,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33335/33901 [2:03:42<02:06,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33336/33901 [2:03:43<02:05,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33337/33901 [2:03:43<02:05,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33338/33901 [2:03:43<02:05,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33339/33901 [2:03:43<02:05,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33340/33901 [2:03:44<02:04,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33341/33901 [2:03:44<02:04,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33342/33901 [2:03:44<02:04,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33343/33901 [2:03:45<02:04,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33344/33901 [2:03:45<02:04,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33345/33901 [2:03:45<02:03,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33346/33901 [2:03:45<02:03,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33347/33901 [2:03:46<02:03,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33348/33901 [2:03:46<02:03,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33349/33901 [2:03:46<02:02,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33350/33901 [2:03:46<02:02,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33351/33901 [2:03:47<02:02,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33352/33901 [2:03:47<02:02,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33353/33901 [2:03:47<02:02,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33354/33901 [2:03:48<02:01,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33355/33901 [2:03:48<02:01,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33356/33901 [2:03:48<02:01,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33357/33901 [2:03:48<02:01,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33358/33901 [2:03:49<02:00,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33359/33901 [2:03:49<02:00,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33360/33901 [2:03:49<02:00,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33361/33901 [2:03:49<02:00,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33362/33901 [2:03:50<02:00,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33363/33901 [2:03:50<01:59,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33364/33901 [2:03:50<01:59,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33365/33901 [2:03:51<01:59,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33366/33901 [2:03:51<01:59,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33367/33901 [2:03:51<01:58,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33368/33901 [2:03:51<01:58,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33369/33901 [2:03:52<01:58,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33370/33901 [2:03:52<01:58,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33371/33901 [2:03:52<01:58,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33372/33901 [2:03:52<01:57,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33373/33901 [2:03:53<01:57,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33374/33901 [2:03:53<01:57,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33375/33901 [2:03:53<01:57,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33376/33901 [2:03:54<01:56,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33377/33901 [2:03:54<01:56,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33378/33901 [2:03:54<01:56,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33379/33901 [2:03:54<01:56,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33380/33901 [2:03:55<01:56,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33381/33901 [2:03:55<01:55,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33382/33901 [2:03:55<01:55,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33383/33901 [2:03:55<01:55,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33384/33901 [2:03:56<01:55,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33385/33901 [2:03:56<01:54,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33386/33901 [2:03:56<01:54,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33387/33901 [2:03:57<01:54,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33388/33901 [2:03:57<01:54,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33389/33901 [2:03:57<01:54,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33390/33901 [2:03:57<01:53,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33391/33901 [2:03:58<01:53,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  98%|█████▉| 33392/33901 [2:03:58<01:53,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33393/33901 [2:03:58<01:53,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33394/33901 [2:03:58<01:52,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33395/33901 [2:03:59<01:52,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33396/33901 [2:03:59<01:52,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33397/33901 [2:03:59<01:52,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33398/33901 [2:03:59<01:52,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33399/33901 [2:04:00<01:51,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33400/33901 [2:04:00<01:51,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33401/33901 [2:04:00<01:51,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33402/33901 [2:04:00<01:51,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33403/33901 [2:04:01<01:50,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33404/33901 [2:04:01<01:50,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33405/33901 [2:04:01<01:50,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33406/33901 [2:04:02<01:50,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33407/33901 [2:04:02<01:50,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33408/33901 [2:04:02<01:49,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33409/33901 [2:04:02<01:49,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33410/33901 [2:04:03<01:49,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33411/33901 [2:04:03<01:49,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33412/33901 [2:04:03<01:48,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33413/33901 [2:04:03<01:48,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33414/33901 [2:04:04<01:48,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33415/33901 [2:04:04<01:48,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33416/33901 [2:04:04<01:48,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33417/33901 [2:04:04<01:47,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33418/33901 [2:04:05<01:47,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33419/33901 [2:04:05<01:47,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33420/33901 [2:04:05<01:47,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33421/33901 [2:04:05<01:46,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33422/33901 [2:04:06<01:46,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33423/33901 [2:04:06<01:46,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33424/33901 [2:04:06<01:46,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33425/33901 [2:04:06<01:46,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33426/33901 [2:04:07<01:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33427/33901 [2:04:07<01:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33428/33901 [2:04:07<01:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33429/33901 [2:04:07<01:45,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33430/33901 [2:04:08<01:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33431/33901 [2:04:08<01:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33432/33901 [2:04:08<01:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33433/33901 [2:04:08<01:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33434/33901 [2:04:09<01:44,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33435/33901 [2:04:09<01:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33436/33901 [2:04:09<01:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33437/33901 [2:04:09<01:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33438/33901 [2:04:10<01:43,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33439/33901 [2:04:10<01:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33440/33901 [2:04:10<01:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33441/33901 [2:04:10<01:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33442/33901 [2:04:11<01:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33443/33901 [2:04:11<01:42,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33444/33901 [2:04:11<01:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33445/33901 [2:04:11<01:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33446/33901 [2:04:12<01:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33447/33901 [2:04:12<01:41,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33448/33901 [2:04:12<01:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33449/33901 [2:04:12<01:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33450/33901 [2:04:13<01:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33451/33901 [2:04:13<01:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33452/33901 [2:04:13<01:40,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33453/33901 [2:04:13<01:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33454/33901 [2:04:14<01:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33455/33901 [2:04:14<01:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33456/33901 [2:04:14<01:39,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33457/33901 [2:04:14<01:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33458/33901 [2:04:15<01:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33459/33901 [2:04:15<01:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33460/33901 [2:04:15<01:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33461/33901 [2:04:15<01:38,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33462/33901 [2:04:16<01:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33463/33901 [2:04:16<01:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33464/33901 [2:04:16<01:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33465/33901 [2:04:17<01:37,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33466/33901 [2:04:17<01:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33467/33901 [2:04:17<01:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33468/33901 [2:04:17<01:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33469/33901 [2:04:18<01:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33470/33901 [2:04:18<01:36,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33471/33901 [2:04:18<01:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33472/33901 [2:04:18<01:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33473/33901 [2:04:19<01:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33474/33901 [2:04:19<01:35,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33475/33901 [2:04:19<01:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33476/33901 [2:04:19<01:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33477/33901 [2:04:20<01:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33478/33901 [2:04:20<01:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33479/33901 [2:04:20<01:34,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33480/33901 [2:04:20<01:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33481/33901 [2:04:21<01:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33482/33901 [2:04:21<01:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33483/33901 [2:04:21<01:33,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33484/33901 [2:04:21<01:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33485/33901 [2:04:22<01:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33486/33901 [2:04:22<01:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33487/33901 [2:04:22<01:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33488/33901 [2:04:22<01:32,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33489/33901 [2:04:23<01:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33490/33901 [2:04:23<01:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33491/33901 [2:04:23<01:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33492/33901 [2:04:24<01:31,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33493/33901 [2:04:24<01:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33494/33901 [2:04:24<01:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33495/33901 [2:04:24<01:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33496/33901 [2:04:25<01:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33497/33901 [2:04:25<01:30,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33498/33901 [2:04:25<01:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33499/33901 [2:04:25<01:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33500/33901 [2:04:26<01:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33501/33901 [2:04:26<01:29,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33502/33901 [2:04:26<01:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33503/33901 [2:04:26<01:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33504/33901 [2:04:27<01:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33505/33901 [2:04:27<01:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33506/33901 [2:04:27<01:28,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33507/33901 [2:04:27<01:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33508/33901 [2:04:28<01:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33509/33901 [2:04:28<01:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33510/33901 [2:04:28<01:27,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33511/33901 [2:04:28<01:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33512/33901 [2:04:29<01:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33513/33901 [2:04:29<01:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33514/33901 [2:04:29<01:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33515/33901 [2:04:29<01:26,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33516/33901 [2:04:30<01:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33517/33901 [2:04:30<01:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33518/33901 [2:04:30<01:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33519/33901 [2:04:30<01:25,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33520/33901 [2:04:31<01:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33521/33901 [2:04:31<01:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33522/33901 [2:04:31<01:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33523/33901 [2:04:31<01:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33524/33901 [2:04:32<01:24,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33525/33901 [2:04:32<01:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33526/33901 [2:04:32<01:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33527/33901 [2:04:32<01:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33528/33901 [2:04:33<01:23,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33529/33901 [2:04:33<01:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33530/33901 [2:04:33<01:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33531/33901 [2:04:33<01:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33532/33901 [2:04:34<01:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33533/33901 [2:04:34<01:22,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33534/33901 [2:04:34<01:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33535/33901 [2:04:34<01:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33536/33901 [2:04:35<01:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33537/33901 [2:04:35<01:21,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33538/33901 [2:04:35<01:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33539/33901 [2:04:35<01:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33540/33901 [2:04:36<01:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33541/33901 [2:04:36<01:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33542/33901 [2:04:36<01:20,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33543/33901 [2:04:37<01:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33544/33901 [2:04:37<01:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33545/33901 [2:04:37<01:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33546/33901 [2:04:37<01:19,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33547/33901 [2:04:38<01:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33548/33901 [2:04:38<01:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33549/33901 [2:04:38<01:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33550/33901 [2:04:38<01:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33551/33901 [2:04:39<01:18,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33552/33901 [2:04:39<01:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33553/33901 [2:04:39<01:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33554/33901 [2:04:39<01:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33555/33901 [2:04:40<01:17,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33556/33901 [2:04:40<01:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33557/33901 [2:04:40<01:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33558/33901 [2:04:40<01:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33559/33901 [2:04:41<01:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33560/33901 [2:04:41<01:16,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33561/33901 [2:04:41<01:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33562/33901 [2:04:41<01:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33563/33901 [2:04:42<01:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33564/33901 [2:04:42<01:15,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33565/33901 [2:04:42<01:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33566/33901 [2:04:42<01:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33567/33901 [2:04:43<01:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33568/33901 [2:04:43<01:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33569/33901 [2:04:43<01:14,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33570/33901 [2:04:43<01:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33571/33901 [2:04:44<01:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33572/33901 [2:04:44<01:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33573/33901 [2:04:44<01:13,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33574/33901 [2:04:44<01:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33575/33901 [2:04:45<01:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33576/33901 [2:04:45<01:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33577/33901 [2:04:45<01:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33578/33901 [2:04:46<01:12,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33579/33901 [2:04:46<01:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33580/33901 [2:04:46<01:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33581/33901 [2:04:46<01:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33582/33901 [2:04:47<01:11,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33583/33901 [2:04:47<01:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33584/33901 [2:04:47<01:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33585/33901 [2:04:47<01:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33586/33901 [2:04:48<01:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33587/33901 [2:04:48<01:10,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33588/33901 [2:04:48<01:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33589/33901 [2:04:48<01:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33590/33901 [2:04:49<01:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33591/33901 [2:04:49<01:09,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33592/33901 [2:04:49<01:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33593/33901 [2:04:49<01:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33594/33901 [2:04:50<01:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33595/33901 [2:04:50<01:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33596/33901 [2:04:50<01:08,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33597/33901 [2:04:50<01:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33598/33901 [2:04:51<01:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33599/33901 [2:04:51<01:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33600/33901 [2:04:51<01:07,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33601/33901 [2:04:51<01:06,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33602/33901 [2:04:52<01:06,  4.49it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33603/33901 [2:04:52<01:06,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33604/33901 [2:04:52<01:06,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33605/33901 [2:04:52<01:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33606/33901 [2:04:53<01:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33607/33901 [2:04:53<01:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33608/33901 [2:04:53<01:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33609/33901 [2:04:53<01:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33610/33901 [2:04:54<01:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33611/33901 [2:04:54<01:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33612/33901 [2:04:54<01:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33613/33901 [2:04:54<01:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33614/33901 [2:04:55<01:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33615/33901 [2:04:55<01:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33616/33901 [2:04:55<01:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33617/33901 [2:04:55<01:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33618/33901 [2:04:56<01:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33619/33901 [2:04:56<01:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33620/33901 [2:04:56<01:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33621/33901 [2:04:56<01:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33622/33901 [2:04:57<01:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33623/33901 [2:04:57<01:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33624/33901 [2:04:57<01:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33625/33901 [2:04:57<01:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33626/33901 [2:04:58<01:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33627/33901 [2:04:58<01:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33628/33901 [2:04:58<01:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33629/33901 [2:04:58<01:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33630/33901 [2:04:59<01:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33631/33901 [2:04:59<01:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33632/33901 [2:04:59<00:59,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33633/33901 [2:05:00<00:59,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33634/33901 [2:05:00<00:59,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33635/33901 [2:05:00<00:59,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33636/33901 [2:05:00<00:59,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33637/33901 [2:05:01<00:58,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33638/33901 [2:05:01<00:58,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33639/33901 [2:05:01<00:58,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33640/33901 [2:05:01<00:58,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33641/33901 [2:05:02<00:57,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33642/33901 [2:05:02<00:57,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33643/33901 [2:05:02<00:57,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33644/33901 [2:05:03<00:57,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33645/33901 [2:05:03<00:57,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33646/33901 [2:05:03<00:56,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33647/33901 [2:05:03<00:56,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33648/33901 [2:05:04<00:56,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33649/33901 [2:05:04<00:56,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33650/33901 [2:05:04<00:55,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33651/33901 [2:05:05<00:55,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33652/33901 [2:05:05<00:55,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33653/33901 [2:05:05<00:55,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33654/33901 [2:05:05<00:55,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33655/33901 [2:05:06<00:54,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33656/33901 [2:05:06<00:54,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33657/33901 [2:05:06<00:54,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33658/33901 [2:05:06<00:54,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33659/33901 [2:05:07<00:53,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33660/33901 [2:05:07<00:53,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33661/33901 [2:05:07<00:53,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33662/33901 [2:05:08<00:53,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33663/33901 [2:05:08<00:53,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33664/33901 [2:05:08<00:52,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33665/33901 [2:05:08<00:52,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33666/33901 [2:05:09<00:52,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33667/33901 [2:05:09<00:52,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33668/33901 [2:05:09<00:51,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33669/33901 [2:05:10<00:51,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33670/33901 [2:05:10<00:51,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33671/33901 [2:05:10<00:51,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33672/33901 [2:05:10<00:51,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33673/33901 [2:05:11<00:50,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33674/33901 [2:05:11<00:50,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33675/33901 [2:05:11<00:50,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33676/33901 [2:05:11<00:50,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33677/33901 [2:05:12<00:49,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33678/33901 [2:05:12<00:49,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33679/33901 [2:05:12<00:49,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33680/33901 [2:05:12<00:49,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33681/33901 [2:05:13<00:49,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33682/33901 [2:05:13<00:48,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33683/33901 [2:05:13<00:48,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33684/33901 [2:05:13<00:48,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33685/33901 [2:05:14<00:48,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33686/33901 [2:05:14<00:47,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33687/33901 [2:05:14<00:47,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33688/33901 [2:05:15<00:47,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33689/33901 [2:05:15<00:47,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33690/33901 [2:05:15<00:47,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33691/33901 [2:05:15<00:46,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33692/33901 [2:05:16<00:46,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33693/33901 [2:05:16<00:46,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33694/33901 [2:05:16<00:46,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33695/33901 [2:05:16<00:45,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33696/33901 [2:05:17<00:45,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33697/33901 [2:05:17<00:45,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33698/33901 [2:05:17<00:45,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33699/33901 [2:05:17<00:45,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33700/33901 [2:05:18<00:44,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33701/33901 [2:05:18<00:44,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33702/33901 [2:05:18<00:44,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33703/33901 [2:05:19<00:44,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33704/33901 [2:05:19<00:43,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33705/33901 [2:05:19<00:43,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33706/33901 [2:05:19<00:43,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33707/33901 [2:05:20<00:43,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33708/33901 [2:05:20<00:43,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33709/33901 [2:05:20<00:42,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33710/33901 [2:05:21<00:42,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33711/33901 [2:05:21<00:42,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33712/33901 [2:05:21<00:42,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33713/33901 [2:05:21<00:41,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33714/33901 [2:05:22<00:41,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33715/33901 [2:05:22<00:41,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33716/33901 [2:05:22<00:41,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33717/33901 [2:05:22<00:41,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33718/33901 [2:05:23<00:40,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33719/33901 [2:05:23<00:40,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33720/33901 [2:05:23<00:40,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33721/33901 [2:05:24<00:40,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33722/33901 [2:05:24<00:39,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33723/33901 [2:05:24<00:39,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33724/33901 [2:05:24<00:39,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33725/33901 [2:05:25<00:39,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33726/33901 [2:05:25<00:39,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33727/33901 [2:05:25<00:38,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33728/33901 [2:05:25<00:38,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33729/33901 [2:05:26<00:38,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33730/33901 [2:05:26<00:38,  4.48it/s, loss=1.51, v_num=0]
Epoch 0:  99%|█████▉| 33731/33901 [2:05:26<00:37,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33732/33901 [2:05:27<00:37,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33733/33901 [2:05:27<00:37,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33734/33901 [2:05:27<00:37,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33735/33901 [2:05:27<00:37,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33736/33901 [2:05:28<00:36,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33737/33901 [2:05:28<00:36,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33738/33901 [2:05:28<00:36,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33739/33901 [2:05:28<00:36,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33740/33901 [2:05:29<00:35,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33741/33901 [2:05:29<00:35,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33742/33901 [2:05:29<00:35,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33743/33901 [2:05:30<00:35,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33744/33901 [2:05:30<00:35,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33745/33901 [2:05:30<00:34,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33746/33901 [2:05:30<00:34,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33747/33901 [2:05:31<00:34,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33748/33901 [2:05:31<00:34,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33749/33901 [2:05:31<00:33,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33750/33901 [2:05:31<00:33,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33751/33901 [2:05:32<00:33,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33752/33901 [2:05:32<00:33,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33753/33901 [2:05:32<00:33,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33754/33901 [2:05:32<00:32,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33755/33901 [2:05:33<00:32,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33756/33901 [2:05:33<00:32,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33757/33901 [2:05:33<00:32,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33758/33901 [2:05:33<00:31,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33759/33901 [2:05:34<00:31,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33760/33901 [2:05:34<00:31,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33761/33901 [2:05:34<00:31,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33762/33901 [2:05:34<00:31,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33763/33901 [2:05:35<00:30,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33764/33901 [2:05:35<00:30,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33765/33901 [2:05:35<00:30,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33766/33901 [2:05:35<00:30,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33767/33901 [2:05:36<00:29,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33768/33901 [2:05:36<00:29,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33769/33901 [2:05:36<00:29,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33770/33901 [2:05:36<00:29,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33771/33901 [2:05:37<00:29,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33772/33901 [2:05:37<00:28,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33773/33901 [2:05:37<00:28,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33774/33901 [2:05:37<00:28,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33775/33901 [2:05:38<00:28,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33776/33901 [2:05:38<00:27,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33777/33901 [2:05:38<00:27,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33778/33901 [2:05:38<00:27,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33779/33901 [2:05:39<00:27,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33780/33901 [2:05:39<00:27,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33781/33901 [2:05:39<00:26,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33782/33901 [2:05:39<00:26,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33783/33901 [2:05:40<00:26,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33784/33901 [2:05:40<00:26,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33785/33901 [2:05:40<00:25,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33786/33901 [2:05:40<00:25,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33787/33901 [2:05:41<00:25,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33788/33901 [2:05:41<00:25,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33789/33901 [2:05:41<00:24,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33790/33901 [2:05:41<00:24,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33791/33901 [2:05:42<00:24,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33792/33901 [2:05:42<00:24,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33793/33901 [2:05:42<00:24,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33794/33901 [2:05:43<00:23,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33795/33901 [2:05:43<00:23,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33796/33901 [2:05:43<00:23,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33797/33901 [2:05:43<00:23,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33798/33901 [2:05:44<00:22,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33799/33901 [2:05:44<00:22,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33800/33901 [2:05:44<00:22,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33801/33901 [2:05:44<00:22,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33802/33901 [2:05:45<00:22,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33803/33901 [2:05:45<00:21,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33804/33901 [2:05:45<00:21,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33805/33901 [2:05:45<00:21,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33806/33901 [2:05:46<00:21,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33807/33901 [2:05:46<00:20,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33808/33901 [2:05:46<00:20,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33809/33901 [2:05:46<00:20,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33810/33901 [2:05:47<00:20,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33811/33901 [2:05:47<00:20,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33812/33901 [2:05:47<00:19,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33813/33901 [2:05:47<00:19,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33814/33901 [2:05:48<00:19,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33815/33901 [2:05:48<00:19,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33816/33901 [2:05:48<00:18,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33817/33901 [2:05:48<00:18,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33818/33901 [2:05:49<00:18,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33819/33901 [2:05:49<00:18,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33820/33901 [2:05:49<00:18,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33821/33901 [2:05:49<00:17,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33822/33901 [2:05:50<00:17,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33823/33901 [2:05:50<00:17,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33824/33901 [2:05:50<00:17,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33825/33901 [2:05:50<00:16,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33826/33901 [2:05:51<00:16,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33827/33901 [2:05:51<00:16,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33828/33901 [2:05:51<00:16,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33829/33901 [2:05:51<00:16,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33830/33901 [2:05:52<00:15,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33831/33901 [2:05:52<00:15,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33832/33901 [2:05:52<00:15,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33833/33901 [2:05:53<00:15,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33834/33901 [2:05:53<00:14,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33835/33901 [2:05:53<00:14,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33836/33901 [2:05:53<00:14,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33837/33901 [2:05:54<00:14,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33838/33901 [2:05:54<00:14,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33839/33901 [2:05:54<00:13,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33840/33901 [2:05:54<00:13,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33841/33901 [2:05:55<00:13,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33842/33901 [2:05:55<00:13,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33843/33901 [2:05:55<00:12,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33844/33901 [2:05:55<00:12,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33845/33901 [2:05:56<00:12,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33846/33901 [2:05:56<00:12,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33847/33901 [2:05:56<00:12,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33848/33901 [2:05:56<00:11,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33849/33901 [2:05:57<00:11,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33850/33901 [2:05:57<00:11,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33851/33901 [2:05:57<00:11,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33852/33901 [2:05:57<00:10,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33853/33901 [2:05:58<00:10,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33854/33901 [2:05:58<00:10,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33855/33901 [2:05:58<00:10,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33856/33901 [2:05:58<00:10,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33857/33901 [2:05:59<00:09,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33858/33901 [2:05:59<00:09,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33859/33901 [2:05:59<00:09,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33860/33901 [2:05:59<00:09,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33861/33901 [2:06:00<00:08,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33862/33901 [2:06:00<00:08,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33863/33901 [2:06:00<00:08,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33864/33901 [2:06:00<00:08,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33865/33901 [2:06:01<00:08,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33866/33901 [2:06:01<00:07,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33867/33901 [2:06:01<00:07,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33868/33901 [2:06:01<00:07,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33869/33901 [2:06:02<00:07,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33870/33901 [2:06:02<00:06,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33871/33901 [2:06:02<00:06,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33872/33901 [2:06:02<00:06,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33873/33901 [2:06:03<00:06,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33874/33901 [2:06:03<00:06,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33875/33901 [2:06:03<00:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33876/33901 [2:06:03<00:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33877/33901 [2:06:04<00:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33878/33901 [2:06:04<00:05,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33879/33901 [2:06:04<00:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33880/33901 [2:06:04<00:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33881/33901 [2:06:05<00:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33882/33901 [2:06:05<00:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33883/33901 [2:06:05<00:04,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33884/33901 [2:06:06<00:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33885/33901 [2:06:06<00:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33886/33901 [2:06:06<00:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33887/33901 [2:06:06<00:03,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33888/33901 [2:06:07<00:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33889/33901 [2:06:07<00:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33890/33901 [2:06:07<00:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33891/33901 [2:06:07<00:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33892/33901 [2:06:08<00:02,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33893/33901 [2:06:08<00:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33894/33901 [2:06:08<00:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33895/33901 [2:06:08<00:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33896/33901 [2:06:09<00:01,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33897/33901 [2:06:09<00:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33898/33901 [2:06:09<00:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33899/33901 [2:06:09<00:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|█████▉| 33900/33901 [2:06:10<00:00,  4.48it/s, loss=1.51, v_num=0]
Epoch 0: 100%|██████| 33901/33901 [2:06:10<00:00,  4.48it/s, loss=1.51, v_num=0]

The test features are normalized.
The distance matrix is processed by re-ranking.
Execution status: FAIL
2024-08-20 20:33:19,025 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Below is the configuration file:

results_dir: "/results/market1501"
encryption_key: nvidia_tao
model:
  backbone: resnet_50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: "/model/resnet50_pretrained.pth"
  input_channels: 3
  input_width: 128
  input_height: 256
  neck: bnneck
  feat_dim: 256
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True
dataset:
  train_dataset_dir: "/data/sample_train"
  test_dataset_dir: "/data/sample_test"
  query_dataset_dir: "/data/sample_query"
  num_classes: 74295
  batch_size: 64
  val_batch_size: 128
  num_workers: 1
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  optim:
    name: Adam
    #lr_steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 1
    weight_decay: 0.0005
    weight_decay_bias: 0.0005
    warmup_factor: 0.01
    warmup_iters: 10
    warmup_method: linear
    base_lr: 0.00035
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
  num_epochs: 120
  checkpoint_interval: 1

Thanks.

Could you please use docker run to run inside the docker and then run training again to check what will happen?

Okay so inside container I have to run the same command.

print("Train model")
!tao model re_identification train \
                  -e $SPECS_DIR/experiment_market1501_resnet.yaml \
                  -r $RESULTS_DIR/market1501 \
                  -k $KEY

I want to add one thing also. When I was training with feat_dim: 2048 the training was going till 2 epoch but each epoch was going till 98% and another epoch start. The system was getting crash i mean the notebook kernel got crashed after epoch 2.

Thanks.

This is not the way to run inside the docker.
You can run something like docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash.
Then inside the docker ,run something like re_identification train xxx

I am afraid it is due to OOM.

Okay I will run container and then execute the training steps.

but I had also tried with the lower batch size but still got the same issue.

Thanks.

Also want to add one thing. When I was training with small dataset I mean with market1501 dataset I was not getting issue but when I added all third party dataset and custom dataset then I got the issue.

OK. So, please run inside the docker to narrow down.
Maybe it is due to OOM.
You can use smaller part of validation dataset.

Dear @Morganh

I had run the training inside the container.

Below is the docker run command.

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --runtime=nvidia -it -v /home/smarg/Documents/PritamDocsData/TAO:/root/data --rm nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash

Below are the logs.

root@03d37c685bce:~/data/TAO_V5.2/notebooks/tao_launcher_starter_kit/re_identification_net/specs# re_identification train -e /root/data/TAO_V5.2/notebooks/tao_launcher_starter_kit/re_identification_net/specs/experiment_market1501_resnet_custom.yaml -r /root/data/Model-Training/PersonReIdentification_V1.1/data/result -k nvidia_tao
sys:1: UserWarning: 
'experiment_market1501_resnet_custom.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:-1: UserWarning: 
'experiment_market1501_resnet_custom.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Train results will be saved at: /root/data/Model-Training/PersonReIdentification_V1.1/data/result/train
Loading pretrained ImageNet model......
╒══════════╤═════════╤════════════╤═════════════╕
│ Subset   │   # IDs │   # Images │   # Cameras │
╞══════════╪═════════╪════════════╪═════════════╡
│ Train    │   70493 │    2232198 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Query    │    8416 │      19865 │           2 │
├──────────┼─────────┼────────────┼─────────────┤
│ Gallery  │   23146 │      76627 │           2 │
╘══════════╧═════════╧════════════╧═════════════╛
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting `Trainer(gpus=[0])` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=[0])` instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: /root/data/Model-Training/PersonReIdentification_V1.1/data/result/train/lightning_logs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:604: UserWarning: Checkpoint directory /root/data/Model-Training/PersonReIdentification_V1.1/data/result/train exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "

  | Name           | Type     | Params
--------------------------------------------
0 | model          | Baseline | 42.1 M
1 | train_accuracy | Accuracy | 0     
2 | val_accuracy   | Accuracy | 0     
--------------------------------------------
42.1 M    Trainable params
256       Non-trainable params
42.1 M    Total params
168.317   Total estimated model params size (MB)
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Training: 0it [00:00, ?it/s]Starting Training Loop.
Epoch 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:04:19<00:00,  4.44it/s, loss=1.49, v_num=0]Train and Val metrics generated.
Epoch 0: 100%|████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:04:20<00:00,  4.44it/s, loss=1.49, v_num=0, train_loss=7.450, base_lr=3.81e-5, train_acc_1=0.294]Training loop in progress
Epoch 1:  92%|██████████████████████████████████████████████████████████████████████▏     | 30585/33139 [1:54:22<09:33,  4.46it/s, loss=1.65, v_num=0, train_loss=7.450, base_lr=3.81e-5, train_acc_1=0.294]Epoch 1Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:03:43<00:00,  4.46it/s, loss=1.5, v_num=0, train_loss=7.450, base_lr=3.81e-5, train_acc_1=0.294]Train and Val metrics generated.
Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:03:43<00:00,  4.46it/s, loss=1.5, v_num=0, train_loss=5.450, base_lr=7.28e-5, train_acc_1=0.501]Training loop in progress
Epoch 2: 100%|███████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:01:19<00:00,  4.55it/s, loss=1.49, v_num=0, train_loss=5.450, base_lr=7.28e-5, train_acc_1=0.501]Train and Val metrics generated.
Epoch 2: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:01:19<00:00,  4.55it/s, loss=1.49, v_num=0, train_loss=4.940, base_lr=0.000107, train_acc_1=0.566]Training loop in progress
Epoch 3: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:01:00<00:00,  4.56it/s, loss=1.51, v_num=0, train_loss=4.940, base_lr=0.000107, train_acc_1=0.566]Train and Val metrics generated.
Epoch 3: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:01:01<00:00,  4.56it/s, loss=1.51, v_num=0, train_loss=4.740, base_lr=0.000142, train_acc_1=0.590]Training loop in progress
Epoch 4: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:00:52<00:00,  4.57it/s, loss=1.54, v_num=0, train_loss=4.740, base_lr=0.000142, train_acc_1=0.590]Train and Val metrics generated.
Epoch 4: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:00:52<00:00,  4.57it/s, loss=1.54, v_num=0, train_loss=4.650, base_lr=0.000177, train_acc_1=0.600]Training loop in progress
Epoch 5: 100%|█████████████████████████████████████████████████████████████████████████████████▉| 33135/33139 [2:00:39<00:00,  4.58it/s, loss=1.51, v_num=0, train_loss=4.650, base_lr=0.000177, train_acc_1=0.600]Train and Val metrics generated.
Epoch 5: 100%|█████████████████████████████████████████████████████████████████████████████████▉| 33135/33139 [2:00:40<00:00,  4.58it/s, loss=1.51, v_num=0, train_loss=4.600, base_lr=0.000211, train_acc_1=0.604]Training loop in progress
Epoch 6: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:00:28<00:00,  4.58it/s, loss=1.52, v_num=0, train_loss=4.600, base_lr=0.000211, train_acc_1=0.604]Train and Val metrics generated.
Epoch 6: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:00:28<00:00,  4.58it/s, loss=1.52, v_num=0, train_loss=4.570, base_lr=0.000246, train_acc_1=0.607]Training loop in progress
Epoch 7: 100%|█████████████████████████████████████████████████████████████████████████████████▉| 33136/33139 [2:00:18<00:00,  4.59it/s, loss=1.57, v_num=0, train_loss=4.570, base_lr=0.000246, train_acc_1=0.607]Train and Val metrics generated.
Epoch 7: 100%|█████████████████████████████████████████████████████████████████████████████████▉| 33136/33139 [2:00:19<00:00,  4.59it/s, loss=1.57, v_num=0, train_loss=4.550, base_lr=0.000281, train_acc_1=0.608]Training loop in progress
Epoch 8: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:00:00<00:00,  4.60it/s, loss=1.51, v_num=0, train_loss=4.550, base_lr=0.000281, train_acc_1=0.608]Train and Val metrics generated.
Epoch 8: 100%|██████████████████████████████████████████████████████████████████████████████████| 33139/33139 [2:00:00<00:00,  4.60it/s, loss=1.51, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]Training loop in progress
Epoch 9:   6%|████▌                                                                              | 1824/33139 [06:53<1:58:13,  4.41it/s, loss=10.9, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1Epoch 9:   6%|████▌                                                                              | 1825/33139 [06:53<1:58:14,  4.41it/s, loss=10.9, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1Epoch 9:   6%|████▏                                                                       | 1825/33139 [06:53<1:58:14,  4.41it/s, loss=10.9, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]Epoch 9:  31%|███████████████████████▉                                                      | 10196/33139 [41:59<1:34:29,  4.05it/s, loss=6, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]Epoch 9:  31%|███████████████████████                                                    | 10198/33139 [42:00<1:34:29,  4.05it/s, loss=6.05, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]Epoch 9:  31%|███████████████████████                                                    | 10199/33139 [42:00<1:34:29,  4.05it/s, loss=6.05, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]Epoch 9:  31%|████████████████████████                                                      | 10199/33139 [42:00<1:34:29,  4.05it/s, loss=6, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]Epoch 9: 100%|███████████████████████████████████████████████████████████████████████████| 33139/33139 [2:08:38<00:00,  4.29it/s, loss=1.55, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Epoch 9: : 33893it [2:11:38,  4.29it/s, loss=1.55, v_num=0, train_loss=4.540, base_lr=0.000315, train_acc_1=0.608]The test features are normalized.                                                         
The distance matrix is computed using euclidean distance. It is then processed by re-ranking.█████████████████████████████████████████████████████████████████████████████| 754/754 [02:59<00:00,  4.19it/s]


Killed
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: module 'urllib3.exceptions' has no attribute 'SubjectAltNameWarning'
Execution status: FAIL

Please suggest what can be done.

Below is configuration file.

results_dir: "/home/smarg/Documents/PritamDocsData/TAO/Model-Training/PersonReIdentification_V1.1/data/result"
encryption_key: nvidia_tao
model:
  backbone: resnet_50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/model/resnet50_pretrained.pth"
  input_channels: 3
  input_width: 128
  input_height: 256
  neck: bnneck
  feat_dim: 256
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True
dataset:
  train_dataset_dir: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/sample_train"
  test_dataset_dir: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/sample_test"
  query_dataset_dir: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/sample_query"
  num_classes: 100
  batch_size: 64
  val_batch_size: 128
  num_workers: 1
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  optim:
    name: Adam
    lr_steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 1
    weight_decay: 0.0005
    weight_decay_bias: 0.0005
    warmup_factor: 0.01
    warmup_iters: 10
    warmup_method: linear
    base_lr: 0.00035
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
  num_epochs: 120
  checkpoint_interval: 10

Please suggest what can be done.

Please set a lower batch_size and retry.

Also, you can try a lower input size and retry.

Okay. I also noted one thing that is when the CPU operation are performed like distance computation and re-ranking then all CPU cores go high and after some time the training got crashed.

With deep-person-reid I also tried the same thing and my observation were that even with the lower batch size and base model was osnet-IBN there I was still getting the same issue.

I am trying the solution suggested by you but if you can also suggest other things on behalf of observation mentioned above. because the training takes time and if changes does not work then again time kills.

Thanks.

Please set a larger CPU memory for docker by using -m.
For example,
docker run --runtime=nvidia -it --rm --ipc=host --gpus all -m 100G --oom-kill-disable --ulimit memlock=-1 nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0

(launcher_tao5) smarg@smarg:~/Documents/PritamDocsData/TAO$ docker run --runtime=nvidia -it --rm --ipc=host --gpus all -m 100G --oom-kill-disable --ulimit memlock=-1 -it -v /home/smarg/Documents/PritamDocsData/TAO:/root/data --rm nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash
**WARNING: Your kernel does not support OomKillDisable. OomKillDisable discarded.**

===========================
=== TAO Toolkit PyTorch ===
===========================

NVIDIA Release 5.2.0-PyT2.1.0 (build 69180607)
TAO Toolkit Version 5.2.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:

Will it work if kernal does not support OomKillDisable. OomKillDisable discarded.

You can try with -m only.

Dear @Morganh

I tried with below command.

docker run --runtime=nvidia -it --rm --ipc=host --gpus all -m 100G --oom-kill-disable --ulimit memlock=-1 -it -v /home/smarg/Documents/PritamDocsData/TAO:/root/data --rm nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash

I also changes image dim and lower batch size too.

Below are config.

results_dir: "/root/data/Model-Training/PersonReIdentification_V1.1/data/result"
encryption_key: nvidia_tao
model:
  backbone: resnet_50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/model/resnet50_pretrained.pth"
  input_channels: 3
  input_width: 64
  input_height: 128
  neck: bnneck
  feat_dim: 256
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True
dataset:
  train_dataset_dir: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/sample_train"
  test_dataset_dir: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/sample_test"
  query_dataset_dir: "/root/data/Model-Training/PersonReIdentification_V1.1/data/reidentificationnet/sample_query"
  num_classes: 100
  batch_size: 32
  val_batch_size: 32
  num_workers: 1
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  optim:
    name: Adam
    lr_steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 1
    weight_decay: 0.0005
    weight_decay_bias: 0.0005
    warmup_factor: 0.01
    warmup_iters: 10
    warmup_method: linear
    base_lr: 0.00035
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
  num_epochs: 120
  checkpoint_interval: 1

but still getting below issue.

root@8a803127e280:~/data/TAO_V5.2/notebooks/tao_launcher_starter_kit/re_identification_net/specs# re_identification train -e /root/data/TAO_V5.2/notebooks/tao_launcher_starter_kit/re_identification_net/specs/experiment_market1501_resnet_custom.yaml -r /root/data/Model-Training/PersonReIdentification_V1.1/data/result -k nvidia_tao
sys:1: UserWarning: 
'experiment_market1501_resnet_custom.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:-1: UserWarning: 
'experiment_market1501_resnet_custom.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Train results will be saved at: /root/data/Model-Training/PersonReIdentification_V1.1/data/result/train
Loading pretrained ImageNet model......
╒══════════╤═════════╤════════════╤═════════════╕
│ Subset   │   # IDs │   # Images │   # Cameras │
╞══════════╪═════════╪════════════╪═════════════╡
│ Train    │   70493 │    2232198 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Query    │    8416 │      19865 │           2 │
├──────────┼─────────┼────────────┼─────────────┤
│ Gallery  │   23146 │      76627 │           2 │
╘══════════╧═════════╧════════════╧═════════════╛
<frozen core.loggers.api_logging>:245: UserWarning: Log file already exists at /root/data/Model-Training/PersonReIdentification_V1.1/data/result/train/status.json
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting `Trainer(gpus=[0])` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=[0])` instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:604: UserWarning: Checkpoint directory /root/data/Model-Training/PersonReIdentification_V1.1/data/result/train exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "

  | Name           | Type     | Params
--------------------------------------------
0 | model          | Baseline | 42.1 M
1 | train_accuracy | Accuracy | 0     
2 | val_accuracy   | Accuracy | 0     
--------------------------------------------
42.1 M    Trainable params
256       Non-trainable params
42.1 M    Total params
168.317   Total estimated model params size (MB)
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Training: 0it [00:00, ?it/s]Starting Training Loop.
Epoch 0:  11%|██████████████▏                                                                                                                    | 7568/69891 [07:46<1:04:05, 16.21it/s, loss=11.2, v_num=1]
Epoch 0:  96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍     | 66872/69891 [1:06:20<02:59, 16.80it/s, loss=1.46, v_num=1]Train and Val metrics generated.
Epoch 0:  96%|████████████████████████████████████████████████████████████████████████▋   | 66872/69891 [1:06:22<02:59, 16.79it/s, loss=1.46, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]Training loop in progress
Epoch 1:  39%|██████████████████████████████▎                                               | 27171/69891 [27:07<42:38, 16.70it/s, loss=6.62, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  41%|███████████████████████████████▊                                              | 28520/69891 [28:28<41:17, 16.70it/s, loss=5.95, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  41%|███████████████████████████████▉                                              | 28598/69891 [28:32<41:12, 16.70it/s, loss=5.87, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  41%|███████████████████████████████▉                                              | 28602/69891 [28:32<41:12, 16.70it/s, loss=5.97, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  41%|███████████████████████████████▉                                              | 28605/69891 [28:33<41:12, 16.70it/s, loss=5.99, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  41%|███████████████████████████████▉                                              | 28609/69891 [28:33<41:12, 16.70it/s, loss=5.92, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  41%|███████████████████████████████▉                                              | 28611/69891 [28:33<41:12, 16.70it/s, loss=5.96, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  41%|███████████████████████████████▉                                              | 28620/69891 [28:33<41:11, 16.70it/s, loss=6.07, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]

Epoch 1:  41%|███████████████████████████████▉                                              | 28621/69891 [28:34<41:11, 16.70it/s, loss=6.07, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]

Epoch 1:  41%|███████████████████████████████▉                                              | 28622/69891 [28:34<41:11, 16.70it/s, loss=6.08, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]

Epoch 1:  41%|███████████████████████████████▉                                              | 28623/69891 [28:34<41:11, 16.70it/s, loss=6.09, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]
Epoch 1:  96%|█████████████████████████████████████████████████████████████████████████▋   | 66852/69891 [1:06:23<03:01, 16.78it/s, loss=1.5, v_num=1, train_loss=7.220, base_lr=3.81e-5, train_acc_1=0.300]Train and Val metrics generated.
Epoch 1:  96%|█████████████████████████████████████████████████████████████████████████▋   | 66852/69891 [1:06:24<03:01, 16.78it/s, loss=1.5, v_num=1, train_loss=5.780, base_lr=7.28e-5, train_acc_1=0.443]Training loop in progress
Epoch 2: 100%|████████████████████████████████████████████████████████████████████████████| 69891/69891 [1:10:53<00:00, 16.43it/s, loss=1.49, v_num=1, train_loss=5.780, base_lr=7.28e-5, train_acc_1=0.443]The test features are normalized.███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3016/3016 [02:36<00:00, 19.23it/s]
The distance matrix is computed using euclidean distance. It is then processed by re-ranking.
Killed
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: module 'urllib3.exceptions' has no attribute 'SubjectAltNameWarning'
Execution status: FAIL

Please help if I can try another things.

Can you set to batch-size 1 and retry?

Okay I will try.

I tried with batch size 1 but from last 15 min the training is not started the process is stuck at below point

root@cdab283d8f9b:~/data/TAO_V5.2/notebooks/tao_launcher_starter_kit/re_identification_net/spere_identification train -e /root/data/TAO_V5.2/notebooks/tao_launcher_starter_kit/re_identification_net/specs/experiment_market1501_resnet_custom.yaml -r /root/data/Model-Training/PersonReIdentification_V1.1/data/result -k nvidia_tao
sys:1: UserWarning: 
'experiment_market1501_resnet_custom.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen core.hydra.hydra_runner>:-1: UserWarning: 
'experiment_market1501_resnet_custom.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Train results will be saved at: /root/data/Model-Training/PersonReIdentification_V1.1/data/result/train
Loading pretrained ImageNet model......
╒══════════╤═════════╤════════════╤═════════════╕
│ Subset   │   # IDs │   # Images │   # Cameras │
╞══════════╪═════════╪════════════╪═════════════╡
│ Train    │   70493 │    2232198 │          10 │
├──────────┼─────────┼────────────┼─────────────┤
│ Query    │    8416 │      19865 │           2 │
├──────────┼─────────┼────────────┼─────────────┤
│ Gallery  │   23146 │      76627 │           2 │
╘══════════╧═════════╧════════════╧═════════════╛



Please set batch_size back to 32 and then use a smaller training dataset to train.

But I was getting lower accuracy with limited dataset.

Is there any another way to resolve this issue while keeping the dataset same.

Thanks.

Please reproduce the issue and monitor the CPU memory and GPU memory at the same time. If it is related to OOM, please try to increase SWAP memory or try to use another machine to retry.