I have made a new environment using conda. I have installed the pre-req and TLT in this environment following NVIDIA TAO Documentation guide. Since I am working in a conda env, I am not running any code in python virtual environment.
After installing the tlt package, I am trying to run the classification jupyter notebook given in the samples.
on the !tlt classification train -e $SPECS_DIR/classification_spec.cfg -r $USER_EXPERIMENT_DIR/output -k $KEY command, the docker suddenly stops working and outputs:
( i made some changes and printed the formatted_command and volumes)
2021-06-17 20:24:36,503 [INFO] root: Registry: ['nvcr.io']
formatted_command: bash -c 'docker exec -it 17d69f83a4d3a69cc27e7c21506d3dcb364c7e98b4860d3ff17eb1c240a61aa5 classification train -e /workspace/tlt-experiments/classification/specs/classification_spec.cfg -r /workspace/tlt-experiments/classification/output -k nvidia_tlt'
volumes: {'/home/omno/Desktop/umair/naturalImages/tlt': {'bind': '/workspace/tlt-experiments', 'mode': 'rw'}, '/home/omno/Desktop/umair/naturalImages/tlt/specs': {'bind': '/workspace/tlt-experiments/classification/specs', 'mode': 'rw'}}
ormatted_command: bash -c 'docker exec -it 17d69f83a4d3a69cc27e7c21506d3dcb364c7e98b4860d3ff17eb1c240a61aa5 classification train -e /workspace/tlt-experiments/classification/specs/classification_spec.cfg -r /workspace/tlt-experiments/classification/output -k nvidia_tlt'
Executing the command.
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-dq5h5g_n because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
['model_config', 'train_config']
2021-06-17 20:24:43,634 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Not needed now, I can find the full log in your jupyter notebook.
Can you add a cell in the notebook and run below?
! tlt classification run cat $SPECS_DIR/classification_spec.cfg
sorry for the late reply. I lost the system on which i was working previously. I got a new system, I carried out the same steps as mentioned in tlt quick start guide.
I have made changes to the cfg file by adding the path to my custom train and test dataset. But ,unfortunately, tlt can not access it. I am getting the following error.
FileNotFoundError: [Errno 2] No such file or directory: '/home/omno/Desktop/umair/tlt-samples/classification/data/train'
2021-06-24 16:59:06,390 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
When you run below, it works well according to the log.
! tlt classification run cat $SPECS_DIR/classification_spec.cfg
But you get stuck when you run training because FileNotFoundError: [Errno 2] No such file or directory: '/home/omno/Desktop/umair/tlt-samples/classification/data/train'
Please run following command to check if your training images folder is available.
! tlt classification run ls /home/omno/Desktop/umair/tlt-samples/classification/data/train |wc -l
could you explain me why am I getting this error FileNotFoundError: [Errno 2] No such file or directory: '/home/omno/Desktop/umair/tlt-samples/classification/data/train'
Since the TLT launcher users docker containers under the hood, these drives/mount points neede to be mapped to the docker. The launcher instance can be configured in the ~/.tlt_mounts.json file.
In the command line, the path should be the “destination” path inside the docker.
There is also a simple way for reference. You can set paths to the same.
For example,