Any Tips for Fine tuning Citrinet model

Hi all,
I managed to get my citrinet model down to a val_wer score of 0.17.
However, when i tried further fine tune the citrinet model, the val_wer is fluctating between 0.17-0.19
Was wondering if there are any tips on how to further reduce the wer score as I aim to get it below 10 percent word error rate (0.10)
Here is the configs I used.

Setup train, validation, test configs

with open_dict(cfg):
# Train dataset
cfg.train_ds.manifest_filepath = TRAIN_MANIFEST
cfg.train_ds.batch_size = 32
cfg.train_ds.num_workers = 24
cfg.train_ds.pin_memory = True
cfg.train_ds.use_start_end_token = True
cfg.train_ds.trim_silence = False
# Validation dataset
cfg.validation_ds.manifest_filepath = VAL_MANIFEST
cfg.validation_ds.batch_size = 16
cfg.validation_ds.num_workers = 24
cfg.validation_ds.pin_memory = True
cfg.validation_ds.use_start_end_token = True
cfg.validation_ds.trim_silence = False

# Test dataset
cfg.test_ds.manifest_filepath = VAL_MANIFEST
cfg.test_ds.batch_size = 16
cfg.test_ds.num_workers = 24
cfg.test_ds.pin_memory = True
cfg.test_ds.use_start_end_token = True
cfg.test_ds.trim_silence = False

with open_dict(model.cfg.optim):
model.cfg.optim.lr = 0.001
model.cfg.optim.weight_decay = 0.0005
model.cfg.optim.sched.warmup_steps = None # Remove default number of steps of warmup
model.cfg.optim.sched.warmup_ratio = 0.10 # 10 % warmup
model.cfg.optim.sched.min_lr = 1e-9
with open_dict(model.cfg.spec_augment):
model.cfg.spec_augment.freq_masks = 2
model.cfg.spec_augment.freq_width = 27
model.cfg.spec_augment.time_masks = 10
model.cfg.spec_augment.time_width = 0.05

Did you use pretrained model?

Yes i used the pre-trained model ‘stt_en_citrinet_1024’

Now it is showing me “val_wer not in top 3”

Which dataset did you train? Is there any baseline?

When there is “val_wer not in top 3”, that means current epoch’s val_wer is not in the top3 best one.

For finetuning, you can try to finetune against more parameters. For example, try lower training batch-size.

I am using my own datasets which are medical transcripts.
I don’t think I have a baseline, I am currently just finetuning the nemo model from the latest checkpoint which has a WER of 17 percent.

Can you use below pretrained tlt model?
$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/speechtotext_english_citrinet/versions/trainable_ v1.7/files/speechtotext_english_citrinet_1024.tlt

Hi, any idea how i am able to load this model for finetuning? I am currently using this finetuning.py script
fine-tuning (1).ipynb (25.8 KB)
However, i am getting the error “No such file or directory: ‘/tmp/tmpn804ehka/model_weights.ckpt’”

I also tried using the tao toolkit for evaluation but i am getting the error “No such file or directory:” with respect to the manifest files audio path.
I am using the default evaluate.yaml that was installed using the download_spec parameter and i have also tried remapping the audio filepath according to the .tao_mounts.json file but i still get the same error.
Attached below is the error log :
root@7c7d3f2099dc:/workspace/tao-experiments/xd# speech_to_text_citrinet evaluate -e /workspace/tao-experiments/xd/evaluate.yaml -m /workspace/tao-experiments/specs/speechtotext_english_citrinet_1024.tlt -r /workspace/tao-experiments/results/ -k tlt_encode
[NeMo W 2022-04-24 16:59:03 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torchaudio-0.7.0a0+42d447d-py3.8-linux-x86_64.egg/torchaudio/backend/utils.py:53: UserWarning: “sox” backend is being deprecated. The default backend will be changed to “sox_io” backend in 0.8.0 and “sox” backend will be removed in 0.9.0. Please migrate to “sox_io” backend. Please refer to [Announcement] Improving I/O for correct and consistent experience · Issue #903 · pytorch/audio · GitHub for the detail.
warnings.warn(

[NeMo W 2022-04-24 16:59:03 experimental:27] Module <class ‘nemo.collections.asr.data.audio_to_text_dali._AudioTextDALIDataset’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-04-24 16:59:06 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torchaudio-0.7.0a0+42d447d-py3.8-linux-x86_64.egg/torchaudio/backend/utils.py:53: UserWarning: “sox” backend is being deprecated. The default backend will be changed to “sox_io” backend in 0.8.0 and “sox” backend will be removed in 0.9.0. Please migrate to “sox_io” backend. Please refer to [Announcement] Improving I/O for correct and consistent experience · Issue #903 · pytorch/audio · GitHub for the detail.
warnings.warn(

[NeMo W 2022-04-24 16:59:06 experimental:27] Module <class ‘nemo.collections.asr.data.audio_to_text_dali._AudioTextDALIDataset’> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-04-24 16:59:06 nemo_logging:349] /home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/scripts/evaluate.py:103: UserWarning:
‘evaluate.yaml’ is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See Automatic schema-matching | Hydra for migration instructions.

[NeMo I 2022-04-24 16:59:06 tlt_logging:20] Experiment configuration:
restore_from: /workspace/tao-experiments/specs/speechtotext_english_citrinet_1024.tlt
exp_manager:
explicit_log_dir: /workspace/tao-experiments/results/
exp_dir: null
name: null
version: null
use_datetime_version: true
resume_if_exists: false
resume_past_end: false
resume_ignore_no_checkpoint: false
create_tensorboard_logger: false
summary_writer_kwargs: null
create_wandb_logger: false
wandb_logger_kwargs: null
create_checkpoint_callback: false
checkpoint_callback_params:
filepath: null
dirpath: null
filename: null
monitor: val_loss
verbose: true
save_last: true
save_top_k: 3
save_weights_only: false
mode: min
period: null
every_n_val_epochs: 1
prefix: null
postfix: .nemo
save_best_model: false
always_save_nemo: false
files_to_copy: null
trainer:
logger: false
checkpoint_callback: false
callbacks: null
default_root_dir: null
gradient_clip_val: 0.0
process_position: 0
num_nodes: 1
num_processes: 1
gpus: 1
auto_select_gpus: false
tpu_cores: null
log_gpu_memory: null
progress_bar_refresh_rate: 1
overfit_batches: 0.0
track_grad_norm: -1
check_val_every_n_epoch: 1
fast_dev_run: false
accumulate_grad_batches: 1
max_epochs: 1000
min_epochs: 1
max_steps: null
min_steps: null
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
val_check_interval: 1.0
flush_logs_every_n_steps: 100
log_every_n_steps: 50
accelerator: ddp
sync_batchnorm: false
precision: 32
weights_summary: full
weights_save_path: null
num_sanity_val_steps: 2
truncated_bptt_steps: null
resume_from_checkpoint: null
profiler: null
benchmark: false
deterministic: false
reload_dataloaders_every_epoch: false
auto_lr_find: false
replace_sampler_ddp: true
terminate_on_nan: false
auto_scale_batch_size: false
prepare_data_per_node: true
amp_backend: native
amp_level: O2
plugins: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle
limit_predict_batches: 1.0
stochastic_weight_avg: false
gradient_clip_algorithm: norm
max_time: null
reload_dataloaders_every_n_epochs: 0
ipus: null
devices: null
test_ds:
manifest_filepath: /workspace/tao-experiments/specs/final/val_manifest.json
batch_size: 32
sample_rate: 16000
labels:
- ’ ’
- a
- b
- c
- d
- e
- f
- g
- h
- i
- j
- k
- l
- m
- ‘n’
- o
- p
- q
- r
- s
- t
- u
- v
- w
- x
- ‘y’
- z
- ‘’’’
num_workers: 0
trim_silence: true
shuffle: false
max_duration: null
is_tarred: false
tarred_audio_filepaths: null
encryption_key: ‘**********’

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
[NeMo W 2022-04-24 16:59:06 exp_manager:414] Exp_manager is logging to /workspace/tao-experiments/results/, but it already exists.
[NeMo I 2022-04-24 16:59:06 exp_manager:220] Experiments will be logged at /workspace/tao-experiments/results
[NeMo I 2022-04-24 16:59:09 mixins:147] Tokenizer SentencePieceTokenizer initialized with 1024 tokens
[NeMo W 2022-04-24 16:59:09 modelPT:130] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: /data/dataset/train/tarred_audio_manifest.json
sample_rate: 16000
batch_size: 16
trim_silence: false
max_duration: 20.0
shuffle: true
is_tarred: true
tarred_audio_filepaths: /data/dataset/train/audio__OP_0…4095_CL_.tar
use_start_end_token: false
num_workers: 16
pin_memory: true

[NeMo W 2022-04-24 16:59:09 modelPT:137] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath:
- /data/librispeech/LibriSpeech/librivox-test-other.json
- /data/librispeech/LibriSpeech/librivox-dev-other.json
sample_rate: 16000
batch_size: 8
shuffle: false
use_start_end_token: false
num_workers: 8
pin_memory: true

[NeMo W 2022-04-24 16:59:09 modelPT:143] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
sample_rate: 16000
batch_size: 32
shuffle: false
use_start_end_token: false

[NeMo I 2022-04-24 16:59:09 features:252] PADDING: 16
[NeMo I 2022-04-24 16:59:09 features:269] STFT using torch
[NeMo I 2022-04-24 16:59:19 collections:173] Dataset loaded with 673 files totalling 0.56 hours
[NeMo I 2022-04-24 16:59:19 collections:174] 0 files were filtered totalling 0.00 hours
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
Added key: store_based_barrier_key:1 to store for rank: 0
Rank 0: Completed store-based barrier for 1 nodes.

distributed_backend=nccl
All DDP processes registered. Starting ddp with 1 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
[NeMo W 2022-04-24 16:59:19 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, test dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 80 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(

Testing: 0it [00:00, ?it/s][NeMo E 2022-04-24 16:59:19 segment:148] Loading /home/ubuntu/workdir/final/clips/6-5_0000.wav via SoundFile raised RuntimeError: Error opening '/home/ubuntu/workdir/final/clips/6-5_0000.wav': System error.. NeMo will fallback to loading via pydub.
Error executing job with overrides: [‘exp_manager.explicit_log_dir=/workspace/tao-experiments/results/’, ‘trainer.gpus=1’, ‘restore_from=/workspace/tao-experiments/specs/speechtotext_english_citrinet_1024.tlt’, ‘encryption_key=tlt_encode’]
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 211, in run_and_report
return func()
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 368, in
lambda: hydra.run(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py”, line 110, in run
_ = ret.return_value
File “/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py”, line 233, in return_value
raise self._return_value
File “/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py”, line 160, in run_job
ret.return_value = task_function(task_cfg)
File “/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/scripts/evaluate.py”, line 96, in main
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 706, in test
results = self._run(model)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 918, in _run
self._dispatch()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 982, in _dispatch
self.accelerator.start_evaluating(self)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py”, line 95, in start_evaluating
self.training_type_plugin.start_evaluating(trainer)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py”, line 165, in start_evaluating
self._results = trainer.run_stage()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 993, in run_stage
return self._run_evaluate()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 1079, in _run_evaluate
eval_loop_results = self._evaluation_loop.run()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py”, line 111, in run
self.advance(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py”, line 110, in advance
dl_outputs = self.epoch_loop.run(
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py”, line 111, in run
self.advance(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py”, line 93, in advance
batch_idx, batch = next(dataloader_iter)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 521, in next
data = self._next_data()
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/data/audio_to_text.py”, line 216, in getitem
features = self.featurizer.process(
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/parts/preprocessing/features.py”, line 109, in process
audio = AudioSegment.from_file(
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/parts/preprocessing/segment.py”, line 165, in from_file
samples = Audio.from_file(audio_file)
File “/opt/conda/lib/python3.8/site-packages/pydub/audio_segment.py”, line 651, in from_file
file, close_file = _fd_or_path_or_tempfile(file, ‘rb’, tempfile=False)
File “/opt/conda/lib/python3.8/site-packages/pydub/utils.py”, line 60, in _fd_or_path_or_tempfile
fd = open(fd, mode=mode)
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/ubuntu/workdir/final/clips/6-5_0000.wav’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/scripts/evaluate.py”, line 103, in
File “/opt/conda/lib/python3.8/site-packages/nemo/core/config/hydra_runner.py”, line 101, in wrapper
_run_hydra(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 367, in _run_hydra
run_and_report(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 251, in run_and_report
assert mdl is not None
AssertionError
Exception ignored in: <function tqdm.del at 0x7f7073809700>
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/tqdm/std.py”, line 1150, in del
File “/opt/conda/lib/python3.8/site-packages/tqdm/std.py”, line 1363, in close
File “/opt/conda/lib/python3.8/site-packages/tqdm/std.py”, line 1542, in display
File “/opt/conda/lib/python3.8/site-packages/tqdm/std.py”, line 1153, in repr
File “/opt/conda/lib/python3.8/site-packages/tqdm/std.py”, line 1503, in format_dict
TypeError: cannot unpack non-iterable NoneType object

You can refer to Tao speech_to_text evaluate+infer show very weak results - #26 by Morganh

# speech_to_text_citrinet finetune -e xxx/finetune.yaml -k tlt_encode -m xxx.tlt -r result
finetuning_ds.manifest_filepath=xxx/train_manifest.json validation_ds.manifest_filepath=xxx/test_manifest.json trainer.max_epochs=100

I tried using the command listed above but i am still getting “File not found” when loading the validation dataset
Any idea why this is occuring ? I doubt its the pathing as my training set was able to load just fine
The command i used was :
speech_to_text_citrinet finetune -e /workspace/tao-experiments/xd/finetune.yaml -k tlt_encode -r /workspace/tao-experiments/results/ finetuning_ds.manifest_filepath=/workspace/tao-experiments/specs/final/train_manifest.json validation_ds.manifest_filepath=/workspace/tao-experiments/specs/final/val_manifest_fortao.json -m /workspace/tao-experiments/specs/speechtotext_english_citrinet_1024.tlt
Attached below is my error log :
Validation sanity check: 0it [00:00, ?it/s][NeMo W 2022-04-25 05:49:39 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 80 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(

Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s][NeMo E 2022-04-25 05:49:39 segment:148] Loading /home/ubuntu/workdir/final/clips/6-5_0000.wav via SoundFile raised RuntimeError: Error opening '/home/ubuntu/workdir/final/clips/6-5_0000.wav': System error.. NeMo will fallback to loading via pydub.
Error executing job with overrides: [‘exp_manager.explicit_log_dir=/workspace/tao-experiments/results/’, ‘trainer.gpus=1’, ‘restore_from=/workspace/tao-experiments/specs/speechtotext_english_citrinet_1024.tlt’, ‘encryption_key=tlt_encode’, ‘finetuning_ds.manifest_filepath=/workspace/tao-experiments/specs/final/train_manifest.json’, ‘validation_ds.manifest_filepath=/workspace/tao-experiments/specs/final/val_manifest_fortao.json’]
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 211, in run_and_report
return func()
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 368, in
lambda: hydra.run(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py”, line 110, in run
_ = ret.return_value
File “/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py”, line 233, in return_value
raise self._return_value
File “/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py”, line 160, in run_job
ret.return_value = task_function(task_cfg)
File “/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/scripts/finetune.py”, line 141, in main
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 553, in fit
self._run(model)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 918, in _run
self._dispatch()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 986, in _dispatch
self.accelerator.start_training(self)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py”, line 92, in start_training
self.training_type_plugin.start_training(trainer)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py”, line 161, in start_training
self._results = trainer.run_stage()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 996, in run_stage
return self._run_train()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 1031, in _run_train
self._run_sanity_check(self.lightning_module)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py”, line 1115, in _run_sanity_check
self._evaluation_loop.run()
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py”, line 111, in run
self.advance(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py”, line 110, in advance
dl_outputs = self.epoch_loop.run(
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py”, line 111, in run
self.advance(*args, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py”, line 93, in advance
batch_idx, batch = next(dataloader_iter)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 521, in next
data = self._next_data()
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/data/audio_to_text.py”, line 216, in getitem
features = self.featurizer.process(
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/parts/preprocessing/features.py”, line 109, in process
audio = AudioSegment.from_file(
File “/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/parts/preprocessing/segment.py”, line 165, in from_file
samples = Audio.from_file(audio_file)
File “/opt/conda/lib/python3.8/site-packages/pydub/audio_segment.py”, line 651, in from_file
file, close_file = _fd_or_path_or_tempfile(file, ‘rb’, tempfile=False)
File “/opt/conda/lib/python3.8/site-packages/pydub/utils.py”, line 60, in _fd_or_path_or_tempfile
fd = open(fd, mode=mode)
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/ubuntu/workdir/final/clips/6-5_0000.wav’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/scripts/finetune.py”, line 153, in
File “/opt/conda/lib/python3.8/site-packages/nemo/core/config/hydra_runner.py”, line 101, in wrapper
_run_hydra(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 367, in _run_hydra
run_and_report(
File “/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py”, line 251, in run_and_report
assert mdl is not None
AssertionError

This kind of error is usually due to wrong mapping setting.

Please check the ~/.tao_mounts.json.

Please note that all the path in the commandline should be the path inside the docker.
The path is defined in ~/.tao_mounts.json.

Managed to fix the pathing, is there anyway to specify which GPU to use? the -g option only specifies how many gpus is to be used for finetuning, not which gpu.

Please add below in the beginning of command line.
CUDA_VISIBLE_DEVICES=0,1,2,3
or
CUDA_VISIBLE_DEVICES=0
or
CUDA_VISIBLE_DEVICES=1

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.