{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's install TAO. It is a simple pip install!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip3 install nvidia-pyindex\n",
"! pip3 install nvidia-tao"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After installing TAO, the next step is to setup the mounts for TAO. The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher.
\n",
"\n",
"`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json` file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"env: HOST_MODELS_DIR=/home/ubuntu/tlt-experiments/models\n",
"env: HOST_DATA_DIR=/home/ubuntu/tlt-experiments/data\n",
"env: HOST_SPECS_DIR=/home/ubuntu/tlt-experiments/specs\n",
"env: HOST_RESULTS_DIR=/home/ubuntu/tlt-experiments/results\n"
]
}
],
"source": [
"# please define these paths on your local host machine\n",
"%env HOST_MODELS_DIR=/home/ubuntu/tlt-experiments/models\n",
"%env HOST_DATA_DIR=/home/ubuntu/tlt-experiments/data\n",
"%env HOST_SPECS_DIR=/home/ubuntu/tlt-experiments/specs\n",
"%env HOST_RESULTS_DIR=/home/ubuntu/tlt-experiments/results"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"! mkdir -p $HOST_DATA_DIR\n",
"! mkdir -p $HOST_SPECS_DIR\n",
"! mkdir -p $HOST_RESULTS_DIR\n",
"! mkdir -p $HOST_MODELS_DIR"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Mapping up the local directories to the TAO docker.\n",
"import json\n",
"import os\n",
"mounts_file = os.path.expanduser(\"~/.tao_mounts.json\")\n",
"tlt_configs = {\n",
" \"Mounts\":[\n",
" {\n",
" \"source\": os.environ[\"HOST_DATA_DIR\"],\n",
" \"destination\": \"/data\"\n",
" },\n",
" {\n",
" \"source\": os.environ[\"HOST_SPECS_DIR\"],\n",
" \"destination\": \"/specs\"\n",
" },\n",
" {\n",
" \"source\": os.environ[\"HOST_RESULTS_DIR\"],\n",
" \"destination\": \"/results\"\n",
" },\n",
" {\n",
" \"source\": os.environ[\"HOST_MODELS_DIR\"],\n",
" \"destination\": \"/models\"\n",
" },\n",
" {\n",
" \"source\": os.path.expanduser(\"~/.cache\"),\n",
" \"destination\": \"/root/.cache\"\n",
" }\n",
" ],\n",
" \"DockerOptions\": {\n",
" \"shm_size\": \"16G\",\n",
" \"ulimits\": {\n",
" \"memlock\": -1,\n",
" \"stack\": 67108864\n",
" }\n",
" }\n",
"}\n",
"# Writing the mounts file.\n",
"with open(mounts_file, \"w\") as mfile:\n",
" json.dump(tlt_configs, mfile, indent=4)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"Mounts\": [\r\n",
" {\r\n",
" \"source\": \"/home/ubuntu/tlt-experiments/data\",\r\n",
" \"destination\": \"/data\"\r\n",
" },\r\n",
" {\r\n",
" \"source\": \"/home/ubuntu/tlt-experiments/specs\",\r\n",
" \"destination\": \"/specs\"\r\n",
" },\r\n",
" {\r\n",
" \"source\": \"/home/ubuntu/tlt-experiments/results\",\r\n",
" \"destination\": \"/results\"\r\n",
" },\r\n",
" {\r\n",
" \"source\": \"/home/ubuntu/tlt-experiments/models\",\r\n",
" \"destination\": \"/models\"\r\n",
" },\r\n",
" {\r\n",
" \"source\": \"/home/ubuntu/.cache\",\r\n",
" \"destination\": \"/root/.cache\"\r\n",
" }\r\n",
" ],\r\n",
" \"DockerOptions\": {\r\n",
" \"shm_size\": \"16G\",\r\n",
" \"ulimits\": {\r\n",
" \"memlock\": -1,\r\n",
" \"stack\": 67108864\r\n",
" }\r\n",
" }\r\n",
"}"
]
}
],
"source": [
"!cat ~/.tao_mounts.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can check the docker image versions and the tasks that perform. You can also check this out with a `tao --help` or"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Configuration of the TAO Toolkit Instance\r\n",
"\r\n",
"dockers: \t\t\r\n",
"\tnvidia/tao/tao-toolkit-tf: \t\t\t\r\n",
"\t\tv3.21.11-tf1.15.5-py3: \t\t\t\t\r\n",
"\t\t\tdocker_registry: nvcr.io\r\n",
"\t\t\ttasks: \r\n",
"\t\t\t\t1. augment\r\n",
"\t\t\t\t2. bpnet\r\n",
"\t\t\t\t3. classification\r\n",
"\t\t\t\t4. dssd\r\n",
"\t\t\t\t5. emotionnet\r\n",
"\t\t\t\t6. efficientdet\r\n",
"\t\t\t\t7. fpenet\r\n",
"\t\t\t\t8. gazenet\r\n",
"\t\t\t\t9. gesturenet\r\n",
"\t\t\t\t10. heartratenet\r\n",
"\t\t\t\t11. lprnet\r\n",
"\t\t\t\t12. mask_rcnn\r\n",
"\t\t\t\t13. multitask_classification\r\n",
"\t\t\t\t14. retinanet\r\n",
"\t\t\t\t15. ssd\r\n",
"\t\t\t\t16. unet\r\n",
"\t\t\t\t17. yolo_v3\r\n",
"\t\t\t\t18. yolo_v4\r\n",
"\t\t\t\t19. yolo_v4_tiny\r\n",
"\t\t\t\t20. converter\r\n",
"\t\tv3.21.11-tf1.15.4-py3: \t\t\t\t\r\n",
"\t\t\tdocker_registry: nvcr.io\r\n",
"\t\t\ttasks: \r\n",
"\t\t\t\t1. detectnet_v2\r\n",
"\t\t\t\t2. faster_rcnn\r\n",
"\tnvidia/tao/tao-toolkit-pyt: \t\t\t\r\n",
"\t\tv3.21.11-py3: \t\t\t\t\r\n",
"\t\t\tdocker_registry: nvcr.io\r\n",
"\t\t\ttasks: \r\n",
"\t\t\t\t1. speech_to_text\r\n",
"\t\t\t\t2. speech_to_text_citrinet\r\n",
"\t\t\t\t3. text_classification\r\n",
"\t\t\t\t4. question_answering\r\n",
"\t\t\t\t5. token_classification\r\n",
"\t\t\t\t6. intent_slot_classification\r\n",
"\t\t\t\t7. punctuation_and_capitalization\r\n",
"\t\t\t\t8. spectro_gen\r\n",
"\t\t\t\t9. vocoder\r\n",
"\t\t\t\t10. action_recognition\r\n",
"\tnvidia/tao/tao-toolkit-lm: \t\t\t\r\n",
"\t\tv3.21.08-py3: \t\t\t\t\r\n",
"\t\t\tdocker_registry: nvcr.io\r\n",
"\t\t\ttasks: \r\n",
"\t\t\t\t1. n_gram\r\n",
"format_version: 2.0\r\n",
"toolkit_version: 3.21.11\r\n",
"published_date: 11/08/2021\r\n"
]
}
],
"source": [
"!tao info --verbose"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set Relevant Paths"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"# NOTE: The following paths are set from the perspective of the TAO Docker.\n",
"\n",
"# The data is saved here\n",
"DATA_DIR = \"/data\"\n",
"SPECS_DIR = \"/specs\"\n",
"RESULTS_DIR = \"/results\"\n",
"MODELS_DIR = \"/models\"\n",
"\n",
"# Set your encryption key, and use the same key for all commands\n",
"KEY = 'tlt_encode'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that everything is setup, we would like to take a bit of time to explain the tao interface for ease of use. The command structure can be broken down as follows: `tao `
\n",
"\n",
"Let's see this in further detail."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Downloading Specs\n",
"TAO's Conversational AI Toolkit works off of spec files which make it easy to edit hyperparameters on the fly. We can proceed to downloading the spec files. The user may choose to modify/rewrite these specs, or even individually override them through the launcher. You can download the default spec files by using the `download_specs` command.
\n",
"\n",
"The -o argument indicating the folder where the default specification files will be downloaded, and -r that instructs the script where to save the logs. **Make sure the -o points to an empty folder!**"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-03-02 21:57:12,147 [INFO] root: Registry: ['nvcr.io']\n",
"2022-03-02 21:57:12,235 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.21.11-py3\n",
"2022-03-02 21:57:12,246 [INFO] tlt.components.docker_handler.docker_handler: The required docker doesn't exist locally/the manifest has changed. Pulling a new docker.\n",
"2022-03-02 21:57:12,246 [INFO] tlt.components.docker_handler.docker_handler: Pulling the required container. This may take several minutes if you're doing this for the first time. Please wait here.\n",
"...\n",
"Pulling from repository: nvcr.io/nvidia/tao/tao-toolkit-pyt\n",
"2022-03-02 22:00:39,037 [INFO] tlt.components.docker_handler.docker_handler: Container pull complete.\n",
"2022-03-02 22:00:39,038 [WARNING] tlt.components.docker_handler.docker_handler: \n",
"Docker will run the commands as root. If you would like to retain your\n",
"local host permissions, please add the \"user\":\"UID:GID\" in the\n",
"DockerOptions portion of the \"/home/ubuntu/.tao_mounts.json\" file. You can obtain your\n",
"users UID and GID by using the \"id -u\" and \"id -g\" commands on the\n",
"terminal.\n",
"[NeMo W 2022-03-02 22:00:55 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torchaudio-0.7.0a0+42d447d-py3.8-linux-x86_64.egg/torchaudio/backend/utils.py:53: UserWarning: \"sox\" backend is being deprecated. The default backend will be changed to \"sox_io\" backend in 0.8.0 and \"sox\" backend will be removed in 0.9.0. Please migrate to \"sox_io\" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.\n",
" warnings.warn(\n",
" \n",
"I0302 22:00:55.606818 139869175908160 font_manager.py:1443] generated new fontManager\n",
"[NeMo W 2022-03-02 22:00:55 experimental:27] Module is experimental, not ready for production and is not fully supported. Use at your own risk.\n",
"[NeMo I 2022-03-02 22:00:58 tlt_logging:20] Experiment configuration:\n",
" exp_manager:\n",
" task_name: download_specs\n",
" explicit_log_dir: /results/speech_to_text_citrinet\n",
" source_data_dir: /opt/conda/lib/python3.8/site-packages/asr/speech_to_text_citrinet/experiment_specs\n",
" target_data_dir: /specs/speech_to_text_citrinet\n",
" workflow: asr\n",
" \n",
"[NeMo I 2022-03-02 22:00:58 download_specs:73] Default specification files for asr downloaded to '/specs/speech_to_text_citrinet'\n",
"[NeMo I 2022-03-02 22:00:58 download_specs:74] Experiment logs saved to '/results/speech_to_text_citrinet'\n",
"\u001b[0m2022-03-02 22:01:00,052 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.\n"
]
}
],
"source": [
"# delete the specs directory if it is already there to avoid errors\n",
"! tao speech_to_text_citrinet download_specs \\\n",
" -r $RESULTS_DIR/speech_to_text_citrinet \\\n",
" -o $SPECS_DIR/speech_to_text_citrinet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ASR Inference"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You might have to work with the infer.yaml file to select the files you want for inference"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-03-02 22:10:47,892 [INFO] root: Registry: ['nvcr.io']\n",
"2022-03-02 22:10:47,986 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.21.11-py3\n",
"2022-03-02 22:10:48,004 [WARNING] tlt.components.docker_handler.docker_handler: \n",
"Docker will run the commands as root. If you would like to retain your\n",
"local host permissions, please add the \"user\":\"UID:GID\" in the\n",
"DockerOptions portion of the \"/home/ubuntu/.tao_mounts.json\" file. You can obtain your\n",
"users UID and GID by using the \"id -u\" and \"id -g\" commands on the\n",
"terminal.\n",
"[NeMo W 2022-03-02 22:10:52 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torchaudio-0.7.0a0+42d447d-py3.8-linux-x86_64.egg/torchaudio/backend/utils.py:53: UserWarning: \"sox\" backend is being deprecated. The default backend will be changed to \"sox_io\" backend in 0.8.0 and \"sox\" backend will be removed in 0.9.0. Please migrate to \"sox_io\" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.\n",
" warnings.warn(\n",
" \n",
"[NeMo W 2022-03-02 22:10:53 experimental:27] Module is experimental, not ready for production and is not fully supported. Use at your own risk.\n",
"[NeMo W 2022-03-02 22:10:57 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torchaudio-0.7.0a0+42d447d-py3.8-linux-x86_64.egg/torchaudio/backend/utils.py:53: UserWarning: \"sox\" backend is being deprecated. The default backend will be changed to \"sox_io\" backend in 0.8.0 and \"sox\" backend will be removed in 0.9.0. Please migrate to \"sox_io\" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.\n",
" warnings.warn(\n",
" \n",
"[NeMo W 2022-03-02 22:10:57 experimental:27] Module is experimental, not ready for production and is not fully supported. Use at your own risk.\n",
"[NeMo W 2022-03-02 22:10:57 nemo_logging:349] /home/jenkins/agent/workspace/tlt-pytorch-main-nightly/asr/speech_to_text_citrinet/scripts/infer.py:79: UserWarning: \n",
" 'infer.yaml' is validated against ConfigStore schema with the same name.\n",
" This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.\n",
" See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.\n",
" \n",
"[NeMo I 2022-03-02 22:10:57 tlt_logging:20] Experiment configuration:\n",
" restore_from: /models/speechtotext_en_us_citrinet_vtrainable_v3.0/speechtotext_en_us_citrinet.tlt\n",
" exp_manager:\n",
" task_name: infer\n",
" explicit_log_dir: /results/citrinet/infer_new\n",
" file_paths:\n",
" - /data/hello_world.wav\n",
" encryption_key: '******'\n",
" \n",
"[NeMo W 2022-03-02 22:10:57 exp_manager:26] Exp_manager is logging to `/results/citrinet/infer_new``, but it already exists.\n",
"[NeMo I 2022-03-02 22:11:01 mixins:147] Tokenizer SentencePieceTokenizer initialized with 1024 tokens\n",
"[NeMo W 2022-03-02 22:11:02 modelPT:130] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n",
" Train config : \n",
" manifest_filepath: null\n",
" sample_rate: 16000\n",
" batch_size: 32\n",
" trim_silence: false\n",
" max_duration: 20.0\n",
" shuffle: true\n",
" is_tarred: false\n",
" tarred_audio_filepaths: null\n",
" use_start_end_token: false\n",
" \n",
"[NeMo W 2022-03-02 22:11:02 modelPT:137] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n",
" Validation config : \n",
" manifest_filepath: null\n",
" sample_rate: 16000\n",
" batch_size: 32\n",
" shuffle: false\n",
" use_start_end_token: false\n",
" \n",
"[NeMo W 2022-03-02 22:11:02 modelPT:143] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n",
" Test config : \n",
" manifest_filepath: null\n",
" sample_rate: 16000\n",
" batch_size: 32\n",
" shuffle: false\n",
" use_start_end_token: false\n",
" \n",
"[NeMo I 2022-03-02 22:11:02 features:252] PADDING: 16\n",
"[NeMo I 2022-03-02 22:11:02 features:269] STFT using torch\n",
"Transcribing: 0%| | 0/1 [00:00, ?it/s][NeMo W 2022-03-02 22:11:17 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+\n",
" Please update PyTorch to remain compatible with later versions of NeMo.\n",
"[NeMo W 2022-03-02 22:11:19 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.\n",
" To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:448.)\n",
" return torch.floor_divide(self, other)\n",
" \n",
"Transcribing: 100%|███████████████████████████████| 1/1 [00:02<00:00, 2.34s/it]\n",
"[NeMo I 2022-03-02 22:11:19 infer:69] The prediction results:\n",
"[NeMo I 2022-03-02 22:11:19 infer:71] File: /data/hello_world.wav\n",
"[NeMo I 2022-03-02 22:11:19 infer:72] Predicted transcript: sc which them seven sc return\n",
"[NeMo I 2022-03-02 22:11:19 infer:75] Experiment logs saved to '/results/citrinet/infer_new'\n",
"\u001b[0m\u001b[0m2022-03-02 22:11:21,108 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.\n"
]
}
],
"source": [
"!tao speech_to_text_citrinet infer \\\n",
" -e $SPECS_DIR/speech_to_text_citrinet/infer.yaml \\\n",
" -g 1 \\\n",
" -k $KEY \\\n",
" -m $MODELS_DIR/speechtotext_en_us_citrinet_vtrainable_v3.0/speechtotext_en_us_citrinet.tlt \\\n",
" -r $RESULTS_DIR/citrinet/infer_new \\\n",
" file_paths=[$DATA_DIR/hello_world.wav]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "launcher",
"language": "python",
"name": "launcher"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}