Cannot export .nemo model to .ejrvs through TLT Export

yicheng.fang · June 10, 2021, 9:17am

In order to deploy the model trained on NeMo on Jarvis, I followed Jarvis Docs | TLT Export for NeMo/TLT, to export .nemo file to .ejrvs file. I used the question_answering example (NeMo v1.0.0) to train the model and store it in qa.nemo, and then use tlt quesiotn_answering export to export it. But got the error below:

[NeMo W 2021-06-10 08:58:18 exp_manager:27] Exp_manager is logging to `/results/nlp/qa/``, but it already exists.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/nemo/core/classes/modelPT.py", line 481, in restore_from
    return cls._eff_restore_from(restore_path, override_config_path, map_location, strict)
  File "/opt/conda/lib/python3.6/site-packages/nemo/core/classes/modelPT.py", line 437, in _eff_restore_from
    strict=strict,
  File "<frozen eff.cookbooks.nemo_cookbook>", line 363, in restore_from
  File "<frozen eff.core.cookbook>", line 154, in validate_archive
  File "/opt/conda/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "<frozen eff.core.archive>", line 464, in restore_from
TypeError: The indicated file '/data/nlp/qa.nemo' is not an EFF archive


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tlt-nemo/nlp/question_answering/scripts/export.py", line 81, in <module>
  File "/opt/conda/lib/python3.6/site-packages/nemo/core/config/hydra_runner.py", line 103, in wrapper
    strict=None,
  File "/opt/conda/lib/python3.6/site-packages/hydra/_internal/utils.py", line 347, in _run_hydra
    lambda: hydra.run(
  File "/opt/conda/lib/python3.6/site-packages/hydra/_internal/utils.py", line 237, in run_and_report
    assert mdl is not None
AssertionError

I think I did exactly on the docs, and I do not know what’s going wrong. Could anyone help?

SunilJB · June 10, 2021, 3:43pm

Hi @yicheng.fang

Please refer note section in below link:
https://docs.nvidia.com/deeplearning/jarvis/archives/110-b/user-guide/docs/model-overview.html#model-development-with-tlt

If you trained your model with the recent NeMo release (1.0.0.b4), you can directly use tlt … export from the TLT launcher to export the NeMo models to the Jarvis required format. For older NeMo releases (1.0.0.b1-b3), this export path might work for some but not all models; you should use at your own risk. For even older NeMo releases (before 1.0.0.b1), it will not work due to missing artifacts that Jarvis ServiceMaker requires.

Could you please check if you are using NeMo release (1.0.0.b4) model?

Thanks

yicheng.fang · June 11, 2021, 1:20am

@SunilJB
I’ve checked the version information before I did the export, so the version I use is v1.0.0, which I think is ahead of the version r1.0.0b4.

SunilJB · June 11, 2021, 6:52am

Hi @yicheng.fang
Could you please share the model which you used in this case, so we can help better?

Thanks

yicheng.fang · June 11, 2021, 6:59am

@SunilJB

Sure. The model I used in this case is from NeMo (v1.0.0 branch) examples: NeMo/examples/nlp/question_answering at v1.0.0 · NVIDIA/NeMo · GitHub

And the model itself is defined at:

I trained the model for 1 epoch and got a .nemo file which tars a checkpoint of the model and a .yaml file consists the configs.