Continuing the discussion from Chmod: cannot access '/opt/ngccli/ngc': No such file or directory:
• Hardware : A100/V100
• Network Type: NA
• TLT Version: v3.22.05-py3
• How to reproduce the issue ? Running the following command:
sudo docker run --runtime=nvidia -it -e NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 --shm-size=40g nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.22.05-py3 results in:
chmod: cannot access '/opt/ngccli/ngc': No such file or directory
I am aware this issue has been reported earlier and solutions have been suggested by @Morganh as has been quoted.
However, I am a docker amateur and hence have what are possibly silly doubts/questions as the following:
How do I use the information in the quoted update? Does it mean I should abandon the container approach and follow https://pypi.org/project/nvidia-tao/installation instructions where this issue has been resolved?
Or if it means that I need to update the nividia-tao version within the container, how do I enter it?
Would the first workaround suggested
( Just add this:–
) still work?
Yes, still work.
Suggest to use this solution.
Then login tao container via below command.
$ tao ssd run /bin/bash
or $ tao detectnet_v2 run /bin/bash
I used the --entrypoint “” approach (
sudo docker run --runtime=nvidia -it --entrypoint "" -e NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 --shm-size=40g --name tao3 nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.22.05-py3 /bin/bash), landed inside the container, updated the nvidia-tao version, and used ran the
tao ssd run /bin/bash command. However I am getting the error as shown in the screenshot.
I also tried the docker login command as suggested. But it says
command not found as seen in the screen shot.
As suggested by you in Is there some spacial things about bpnet? A question about "tlt bpnet dataset_convert " for bpnet - #5 by Morganh,
docker run hello-world is not working either.
There are two ways of running inside the tao container.
- Use “docker run”.
Just as you run.
$ sudo docker run --runtime=nvidia -it --entrypoint “” -e NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 --shm-size=40g --name tao3 nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.22.05-py3 /bin/bash
It is already running inside tao 22.05 container. It is not needed to run "tao ssd " inside again.
Just need to run something similar to below.
# ssd train balabala
- Use tao launcher.
$ tao ssd run /bin/bash
Both “docker” and “ssd” commands are not being recognised. Both commands result in
command not found output. Please refer to the screenshots below.
Got the reason.
Please change to nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3 docker if you want to run ssd network.
See below info.
$ tao info --verbose
Configuration of the TAO Toolkit Instance
I am sorry if I wasn’t clear before. But I am trying to train speech_to_text_conformer network. I used
ssd command as an example. I didn’t know there are separate images for different networks. So I need to continue using
I ran the following command again:
sudo docker run --runtime=nvidia -it --entrypoint "" -e NVIDIA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 --shm-size=40g --name tao6 nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.22.05-py3 /bin/bash
Now I am able to download the spec files and will hopefully start training soon. Thank you for you patience and help so far.
A clarification: The data we use will remain local on the local machine, is that correct?
Yes, for speech_to_text_conformer.
No, you need to add -v /yourlocalfolder:/dockerfolder
Please search docker “-v” usage.
I understand the usage of “-v”. My doubt stems from my understanding that TAO toolkit needs internet for carrying out training the first time, is that correct? My reference for this info is the document you have written for offline training using TAO here. Can you please explain why exactly the internet is required apart from downloading the TAO image. Is there a risk of exposing our data to cloud servers at any point?
Yes, in order to “docker pull” the TAO container.
No others is required.
For running TAO in your local machines, your data still locates at your local machine.
Do you mean you are going to run TAO training in cloud server?
The TAO training will happen on a local machine.
Thank you so much for the clarifications. I have a much better understanding now.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.