Error finetuning with new catalog RIVA Citrinet ASR English model - "Archive doesn't have the required runtime, format, version or object class type"

I’m trying to fine-tuning a Citrinet model that was very recently added to the Nvidia Catalog: RIVA Citrinet ASR English | NVIDIA NGC

I have had success fune-tuning other Citrinet models with exactly the same process, but when I use the new model (“speechtotext_english_citrinet.tlt”) instead of an older one (such as “speechtotext_english_citrinet_1024.tlt” from the Nvidia Catalog), it gives an error.

Here’s the command I am using:

tao speech_to_text_citrinet finetune \
     -e $SPECS_DIR/speech_to_text_citrinet/finetune_custom.yaml \
     -g 1 \
     -k $KEY \
     -m $RESULTS_DIR/citrinet/download3/speechtotext_english_citrinet.tlt \
     -r $RESULTS_DIR/citrinet/finetune_test \
     finetuning_ds.manifest_filepath=$DATA_DIR/XXXXXXX.manifest \
     finetuning_ds.batch_size=8 \
     validation_ds.manifest_filepath=$DATA_DIR/YYYYYYYY.manifest \
     validation_ds.batch_size=8 \
     trainer.max_epochs=20 \
     finetuning_ds.num_workers=4 \
     validation_ds.num_workers=4 \
     trainer.gpus=1

If I change the model path to match the older version of the model from the Catalog then it works fine.

And the output (starting after the experiment configuration part):

GPU available: True, used: True
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
[NeMo W 2022-01-14 03:55:13 exp_manager:303] There was no checkpoint folder at checkpoint_dir :/results/citrinet/finetune_test/checkpoints. Training from scratch.
[NeMo I 2022-01-14 03:55:13 exp_manager:194] Experiments will be logged at /results/citrinet/finetune_test
Condition for key 'format_version' (2  <built-in function eq> 1) is not fulfilled
[NeMo W 2022-01-14 03:55:16 modelPT:193] Using /tmp/tmpa92cu1mv/tokenizer.model instead of /home/scratch.p3/vpraveen/experiments/asr/speech_to_text_citrinet/speechtotext_english_citrinet_nemo_3.0/tokenizer_spe_unigram_v1024/tokenizer.model.
[NeMo W 2022-01-14 03:55:16 modelPT:193] Using /tmp/tmpa92cu1mv/vocab.txt instead of /home/scratch.p3/vpraveen/experiments/asr/speech_to_text_citrinet/speechtotext_english_citrinet_nemo_3.0/tokenizer_spe_unigram_v1024/vocab.txt.
[NeMo I 2022-01-14 03:55:16 mixins:98] Tokenizer SentencePieceTokenizer initialized with 1024 tokens
[NeMo W 2022-01-14 03:55:17 modelPT:145] Please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 32
    trim_silence: false
    max_duration: 20.0
    shuffle: true
    is_tarred: false
    tarred_audio_filepaths: null
    use_start_end_token: false
    
[NeMo W 2022-01-14 03:55:17 modelPT:152] Please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 32
    shuffle: false
    use_start_end_token: false
    
[NeMo W 2022-01-14 03:55:17 modelPT:159] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 32
    shuffle: false
    use_start_end_token: false
    
[NeMo I 2022-01-14 03:55:17 features:236] PADDING: 16
[NeMo I 2022-01-14 03:55:17 features:252] STFT using torch
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/nemo/core/classes/modelPT.py", line 481, in restore_from
    return cls._eff_restore_from(restore_path, override_config_path, map_location, strict)
  File "/opt/conda/lib/python3.8/site-packages/nemo/core/classes/modelPT.py", line 432, in _eff_restore_from
    return NeMoCookbook().restore_from(
  File "<frozen src.eff.cookbooks.nemo_cookbook>", line 365, in restore_from
TypeError: Archive doesn't have the required runtime, format, version or object class type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 127, in run_job
    ret.return_value = task_function(task_cfg)
  File "/tlt-nemo/asr/speech_to_text_citrinet/scripts/finetune.py", line 120, in main
  File "/opt/conda/lib/python3.8/site-packages/nemo/core/classes/modelPT.py", line 484, in restore_from
    return cls._default_restore_from(restore_path, override_config_path, map_location, strict)
  File "/opt/conda/lib/python3.8/site-packages/nemo/core/classes/modelPT.py", line 400, in _default_restore_from
    instance.load_state_dict(torch.load(model_weights, map_location=map_location), strict=strict)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 579, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpa92cu1mv/model_weights.ckpt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tlt-nemo/asr/speech_to_text_citrinet/scripts/finetune.py", line 152, in <module>
  File "/opt/conda/lib/python3.8/site-packages/nemo/core/config/hydra_runner.py", line 98, in wrapper
    _run_hydra(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 237, in run_and_report
    assert mdl is not None
AssertionError
2022-01-14 03:55:25,169 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Do I need to be on a different version of Tao or something?

Hardware: AWS g4dn.12xlarge instance, with a 4 T4 GPUs (but only using one for the command above)
Operating System: Ubuntu 20.04 LTS via NVIDIA GPU Cloud image
Riva Version - 1.7
TLT Version (if relevant) - 3.21.08
How to reproduce the issue ? (see above)