Following the release of TLT3.0, I was trying to pull the latest container as shown on here - Transfer Learning Toolkit for Video Streaming Analytics | NVIDIA NGC - tlt-streamanalytics
Since I don’t have any NVIDIA powered hardware, I am running it on GCP. When I run the command that I used to run TLT2.0 on GCP -
docker run --runtime=nvidia -it -v /home/<username>/tlt-experiments:/workspace/tlt-experiments nvcr.io/nvidia/tlt-streamanalytics:<version> /bin/bash
It continuously threw me error saying that CUDA 11.1 and up is required. However, I had 11.0 on GCP machine, and when I tried updating it to 11.2 per this doc (CUDA Installation Guide for Linux), it asked me to update the cuda-driver to 460 and up.
The GCP instance I am running had cuda-driver-450.xx, and when I tried updating, it asked for nvidia-driver to be updated as well. This is what I used to deploy the image - Google Cloud console
Long story short, it was not possible to do, and I could not find an image on GCP market place that would allow me to get the right driver for this container (Transfer Learning Toolkit for Video Streaming Analytics | NVIDIA NGC - tlt-streamanalytics).
If I don’t have NVIDIA powered hardware, do you have any recommendation or tutorial on how to run it on the cloud? It doesn’t have to be GCP. It could also be AWS.
I also tried following TLT Launcher step as “recommended” in the doc (TLT Launcher — Transfer Learning Toolkit 3.0 documentation); however, I for some reason, could not get that to work either. I couldn’t install virtualenv
or virtualenvwrapper
as a regular user. When I finally managed to as a sudo user, tlt --help
works but tlt detectnet_v2 --help
would throw errors like this
(launcher) root@nvidia-gpu-cloud-image-3-vm:/home/a428tm# tlt detectnet_v2 --help
Traceback (most recent call last):
File “/root/.virtualenvs/launcher/bin/tlt”, line 8, in
sys.exit(main())
File “/root/.virtualenvs/launcher/lib/python3.6/site-packages/tlt/entrypoint/entrypoint.py”, line 114, in main
args[1:]
File “/root/.virtualenvs/launcher/lib/python3.6/site-packages/tlt/components/instance_handler/local_instance.py”,
line 262, in launch_command
docker_logged_in()
File “/root/.virtualenvs/launcher/lib/python3.6/site-packages/tlt/components/instance_handler/utils.py”, line 129
, in docker_logged_in
data = load_config_file(docker_config)
File “/root/.virtualenvs/launcher/lib/python3.6/site-packages/tlt/components/instance_handler/utils.py”, line 66,
in load_config_file
“No file found at: {}”.format(config_path)
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json
I am getting stuck on the first few steps of this, and I would appreciate any pointer.
Than kyou