Tao Citirnet dataset_convert

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) - T4
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) - v3.21.11-py3
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Following the tutorial at https://developer.nvidia.com/blog/speech-recognition-customizing-models-to-your-domain-using-transfer-learning/

Instead of AN4 i am using MCV. I am trying to convert MCV data format usign the below command

tao speech_to_text_citrinet dataset_convert -e /home/ubuntu/specs/speech_to_text_citrinet/dataset_convert_en.yaml -r /home/ubuntu/results/citrinet/dataset_convert source_data_dir= /data/cv-corpus-10.0-2022-07-04/en target_data_dir=/data/cv-corpus-10.0-2022-07-04/en_converted

for source dir above i tried both root as well as the en folder

When running the command I am getting the below error

File "/opt/conda/bin/speech_to_text_citrinet", line 8, in <module>
    sys.exit(main())
  File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/entrypoint/speech_to_text.py", line 94, in main
  File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/tlt_utils/entrypoint.py", line 33, in get_subtasks
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/scripts/dataset_convert.py", line 33, in <module>
  File "/opt/conda/lib/python3.8/site-packages/librosa/__init__.py", line 211, in <module>
    from . import core
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/__init__.py", line 5, in <module>
    from .convert import *  # pylint: disable=wildcard-import
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/convert.py", line 7, in <module>
    from . import notation
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/notation.py", line 8, in <module>
    from ..util.exceptions import ParameterError
  File "/opt/conda/lib/python3.8/site-packages/librosa/util/__init__.py", line 83, in <module>
    from .utils import *  # pylint: disable=wildcard-import
  File "/opt/conda/lib/python3.8/site-packages/librosa/util/utils.py", line 1848, in <module>
    def __shear_dense(X, factor=+1, axis=-1):
  File "/opt/conda/lib/python3.8/site-packages/numba/core/decorators.py", line 214, in wrapper
    disp.enable_caching()
  File "/opt/conda/lib/python3.8/site-packages/numba/core/dispatcher.py", line 781, in enable_caching
    self._cache = FunctionCache(self.py_func)
  File "/opt/conda/lib/python3.8/site-packages/numba/core/caching.py", line 616, in __init__
    self._impl = self._impl_class(py_func)
  File "/opt/conda/lib/python3.8/site-packages/numba/core/caching.py", line 351, in __init__
    raise RuntimeError("cannot cache function %r: no locator available "
RuntimeError: cannot cache function '__shear_dense': no locator available for file '/opt/conda/lib/python3.8/site-packages/librosa/util/utils.py'
2022-08-13 16:45:33,216 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Any pointers here is much appreciated.

How about running with an4 dataset? Is it successful?
More, please download the notebook for reference. TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation

Yes, I tried with AN4 as well. No luck same error.

It does not make sense.
Suggest you to download the jupyter notebook and run it.
More, you can also use latest 22.05 tao.

OK will try that. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.