We are using SkyPilot, a framework for running AI and batch workloads on any infra, to do training on GCP. When running tao (particularly tao model dino train -e dino.yaml
, toolkit_version: 5.5.0), we get the error of “The Input Device Is Not a TTY”. I’m assuming this appears because the launcher triggers docker to attach a TTY (-t argument).
Since the docker configuration through .tao_mounts.json is quite limited, the only way I found to fix this is by triggering the docker myself.
Is there a way that I can run tao directly without the TTY?
It is fine to run the tao docker directly instead of tao-launcher.
Please search the keyword on the TAO forum since there is similar topics previously.
I understand it’s in the forum, that’s how I figured out I had to use docker myself.
I think this option should be available to control in .tao_mounts.json.
You can also use TAO docker. That means you can run below docker run
command instead of using tao-launcher.
To run tao-5.5-pyt docker, please
`$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt /bin/bash
Then run something without tao model
in the beginning. For example,
#
dino train -e dino.yaml
Yes, thank you. This still doesn’t solve the problem of running it without TTY, not interactive, with tao launcher. Are there any plans to add that option to .tao_mounts.json?
The source code is in tao_launcher/nvidia_tao_cli/components/docker_handler/docker_handler.py at main · NVIDIA/tao_launcher · GitHub. You can try to modify the tao-launcher.
Thank you for the link, I actually saw that the tty option, even if not documented in the tao launcher page, it is available from the code.
Even with that option it didn’t work, it just stops the container.
I also saw that there are two functions:
if os.getenv("CI_PROJECT_DIR", None) is not None:
docker_handler.run_container_on_ci(command)
else:
docker_handler.run_container(command)
By forcing CI, it works! Do you know the difference between the two functions?