Hi, when I follow the sample notebooks and with the sample dataset, I am getting errors.
For QuartzNet:
Validation sanity check: 0it [00:00, ?it/s][NeMo W 2021-06-22 14:02:12 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
Please update PyTorch to remain compatible with later versions of NeMo.
Validation sanity check: 50%|██████████ | 1/2 [00:02<00:02, 2.46s/it][NeMo W 2021-06-22 14:02:15 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:49: UserWarning: The validation_epoch_end should not return anything as of 9.1. To log, use self.log(…) or self.write(…) directly in the LightningModule
warnings.warn(*args, **kwargs)
Epoch 2: 0%| | 0/539 [00:00<?, ?it/s][W reducer.cpp:1042] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
Epoch 2: 7%|█▍ | 36/539 [00:08<02:03, 4.07it/s, loss=57.7]Saving latest checkpoint…
Saving latest checkpoint…
Epoch 2, global step 243: val_loss reached 15393664008192.00000 (best 15393664008192.00000), saving model to “/results/quartznet/train/checkpoints/trained-model—val_loss=15393664008192.00-epoch=2.ckpt” as top 3
Epoch 2, global step 243: val_loss reached 15393664008192.00000 (best 15393664008192.00000), saving model to “/results/quartznet/train/checkpoints/trained-model—val_loss=15393664008192.00-epoch=2.ckpt” as top 3
Epoch 2: 7%|█▍ | 36/539 [00:12<02:57, 2.83it/s, loss=57.7]
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 198, in run_and_report
return func()
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 347, in
lambda: hydra.run(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py”, line 107, in run
return run_job(
File “/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py”, line 127, in run_job
ret.return_value = task_function(task_cfg)
File “/tlt-nemo/asr/speech_to_text/scripts/train.py”, line 122, in main
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 472, in fit
results = self.accelerator_backend.train()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py”, line 152, in train
results = self.ddp_train(process_idx=self.task_idx, model=model)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py”, line 307, in ddp_train
results = self.train_or_test()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py”, line 69, in train_or_test
results = self.trainer.train()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 523, in train
self.train_loop.run_training_epoch()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py”, line 573, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py”, line 731, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py”, line 506, in optimizer_step
model_ref.optimizer_step(
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py”, line 1253, in optimizer_step
optimizer.step(closure=optimizer_closure)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py”, line 280, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py”, line 138, in __optimizer_step
optimizer.step(closure=closure, *args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py”, line 65, in wrapper
return wrapped(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py”, line 89, in wrapper
return func(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/nemo/core/optim/novograd.py”, line 83, in step
loss = closure()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py”, line 721, in train_step_and_backward_closure
result = self.training_step_and_backward(
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py”, line 819, in training_step_and_backward
result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py”, line 340, in training_step
training_step_output = self.trainer.accelerator_backend.training_step(args)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py”, line 158, in training_step
return self._step(args)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py”, line 172, in _step
output = self.trainer.model(*args)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 881, in _call_impl
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/overrides/data_parallel.py”, line 179, in forward
output = self.module.training_step(*inputs[0], **kwargs[0])
File “/opt/conda/lib/python3.8/site-packages/nemo/utils/model_utils.py”, line 337, in wrap_training_step
output_dict = wrapped(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/models/ctc_models.py”, line 400, in training_step
log_probs, encoded_len, predictions = self.forward(input_signal=signal, input_signal_length=signal_len)
File “/opt/conda/lib/python3.8/site-packages/nemo/core/classes/common.py”, line 535, in call
outputs = wrapped(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/models/ctc_models.py”, line 385, in forward
processed_signal = self.spec_augmentation(input_spec=processed_signal)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 881, in _call_impl
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/nemo/core/classes/common.py”, line 535, in call
outputs = wrapped(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/modules/audio_preprocessing.py”, line 471, in forward
augmented_spec = self.spec_cutout(input_spec)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 881, in _call_impl
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/parts/spectr_augment.py”, line 114, in forward
rect_y = self._rng.randint(0, sh[2] - self.rect_time)
File “/opt/conda/lib/python3.8/random.py”, line 248, in randint
return self.randrange(a, b+1)
File “/opt/conda/lib/python3.8/random.py”, line 226, in randrange
raise ValueError(“empty range for randrange() (%d, %d, %d)” % (istart, istop, width))
ValueError: empty range for randrange() (0, -23, -23)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/tlt-nemo/asr/speech_to_text/scripts/train.py”, line 134, in
File “/opt/conda/lib/python3.8/site-packages/nemo/core/config/hydra_runner.py”, line 98, in wrapper
_run_hydra(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 346, in _run_hydra
run_and_report(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 237, in run_and_report
assert mdl is not None
AssertionError
For Jasper:
alidation sanity check: 0it [00:00, ?it/s][NeMo W 2021-06-22 14:05:11 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
Please update PyTorch to remain compatible with later versions of NeMo.
Validation sanity check: 100%|████████████████████| 2/2 [00:02<00:00, 1.70s/it][NeMo W 2021-06-22 14:05:14 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:49: UserWarning: The validation_epoch_end should not return anything as of 9.1. To log, use self.log(…) or self.write(…) directly in the LightningModule
warnings.warn(*args, **kwargs)
Epoch 0: 0%| | 0/1078 [00:00<?, ?it/s][W reducer.cpp:1042] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
Epoch 0: 0%| | 0/1078 [00:00<?, ?it/s]
2021-06-22 22:06:33,397 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
And training doesnt continue