I am trying to make Weights and Biases work on tao (as it works in NeMo). I am running the asr-python-advanced-finetune-am-citrinet-tao-finetuning.ipynb
notebook and modifying the cells to pass the underlying params to the train function with no luck…
!tao speech_to_text_citrinet train \
-e $SPECS_DIR/speech_to_text_citrinet/train_citrinet_bpe.yaml \
-g 1 \
-k $KEY \
-r $RESULTS_DIR/citrinet/train \
training_ds.manifest_filepath=$DATA_DIR/an4_converted/train_manifest.json \
validation_ds.manifest_filepath=$DATA_DIR/an4_converted/test_manifest.json \
trainer.max_epochs=1 \
training_ds.num_workers=4 \
validation_ds.num_workers=4 \
model.tokenizer.dir=$DATA_DIR/an4/tokenizer_spe_unigram_v32 \
exp_manager.create_wandb_logger=True \
exp_manager.wandb_logger_kwargs.name=run \
exp_manager.wandb_logger_kwargs.project=tao
and I get the following error:
2022-07-19 17:41:00,138 [INFO] root: Registry: ['nvcr.io']
2022-07-19 17:41:00,236 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.22.05-py3
2022-07-19 17:41:00,518 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tcapelle/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2022-07-19 17:41:11 nemo_logging:349] /home/jenkins/agent/workspace/tlt-pytorch-main-nightly/conv_ai/asr/speech_to_text_ctc/scripts/train.py:159: UserWarning:
'train_citrinet_bpe.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 368, in <lambda>
lambda: hydra.run(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 87, in run
cfg = self.compose_config(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 564, in compose_config
cfg = self.config_loader.load_configuration(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 146, in load_configuration
return self._load_configuration_impl(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 262, in _load_configuration_impl
ConfigLoaderImpl._apply_overrides_to_config(config_overrides, cfg)
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 378, in _apply_overrides_to_config
OmegaConf.update(cfg, key, value, merge=True)
File "/opt/conda/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 724, in update
assert isinstance(
AssertionError: Unexpected type for root: NoneType
2022-07-19 17:41:13,827 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
I am already passing the wandb api key via an env variable in the tao_mounts.json
file.
Any tips on how to debug the issue?