tlt-export error

010akv · December 17, 2019, 3:23am

Hi,

I am trying to export one of my trained models to etlt for deployment with deepstream. I’m running “tlt-export -h” and I’m seeing:

Using TensorFlow backend.
Traceback (most recent call last):
File “/usr/local/bin/tlt-export”, line 6, in
from iva.common.magnet_export import main
File “./common/magnet_export.py”, line 23, in
File “./modulus/export/_tensorrt.py”, line 26, in
File “/usr/local/lib/python2.7/dist-packages/pycuda/autoinit.py”, line 9, in
context = make_default_context()
File “/usr/local/lib/python2.7/dist-packages/pycuda/tools.py”, line 204, in make_default_context
“on any of the %d detected devices” % ndevices)
RuntimeError: make_default_context() wasn’t able to create a context on any of the 1 detected devices

Below are few other details:

docker: docker run --privileged=true --runtime=nvidia -itd --rm -v /home/anusha_k/tlt:/workspace/tlt -p 8000:8000 nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
nvidia-smi
Mon Dec 16 23:54:29 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… Off | 00000000:00:04.0 Off | 0 |
| N/A 49C P0 55W / 300W | 15881MiB / 16130MiB | 64% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
±----------------------------------------------------------------------------+

I can run all other tlt commands just fine. For example,
tlt-dataset-convert -h
Using TensorFlow backend.
usage: dataset_converter [-h] -d DATASET_EXPORT_SPEC -o OUTPUT_FILENAME
[-f VALIDATION_FOLD] [-v]

Convert object detection datasets to TFRecords

optional arguments:
-h, --help show this help message and exit
-d DATASET_EXPORT_SPEC, --dataset-export-spec DATASET_EXPORT_SPEC
Path to the detection dataset spec containing config
for exporting .tfrecords.
-o OUTPUT_FILENAME, --output-filename OUTPUT_FILENAME
Output file name.
-f VALIDATION_FOLD, --validation-fold VALIDATION_FOLD
Indicate the validation fold in 0-based indexing. This
is required when modifying the training set but
otherwise optional.
-v, --verbose Flag to get detailed logs during the conversion
process.

On a side note, I see that the recommended OS is Ubuntu 18.04 (https://docs.nvidia.com/metropolis/TLT/tlt-release-notes/#magnet-release-notes), but the docker containers (nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2 and v1.0.1_py2) are based off of 16.04? Wondering why?

cat /etc/os-release
NAME=“Ubuntu”
VERSION=“16.04.6 LTS (Xenial Xerus)”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 16.04.6 LTS”
VERSION_ID=“16.04”
HOME_URL=“http://www.ubuntu.com/”
SUPPORT_URL=“http://help.ubuntu.com/”
BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/”
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Manually updating to 18.04 doesn’t solve this issue, though.

-Anusha

Morganh · December 17, 2019, 6:18am

Hi Anusha,
Can you confirm if all below commands work for “-h”?

root@ec31b0b6631f:/workspace# ll /usr/local/bin/tlt*
-rwxr-xr-x 1 root root 241 Sep 13 21:15 /usr/local/bin/tlt-dataset-convert*
-rwxr-xr-x 1 root root 227 Sep 13 21:15 /usr/local/bin/tlt-evaluate*
-rwxr-xr-x 1 root root 225 Sep 13 21:15 /usr/local/bin/tlt-export*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-infer*
-rwxr-xr-x 1 root root 229 Sep 13 21:15 /usr/local/bin/tlt-int8-tensorfile*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-prune*
-rwxr-xr-x 1 root root 215 Sep 13 21:15 /usr/local/bin/tlt-pull*
-rwxr-xr-x 1 root root 736 Aug 27 21:09 /usr/local/bin/tlt-train*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-train-g1*

More, below requirement is for your host PC instead of TLT container.

<b>Software Requirements</b>
Ubuntu 18.04 LTS
NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/
nvidia-docker2 installed, instructions: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
NVIDIA GPU driver v410.xx or above

010akv · December 17, 2019, 3:55pm

Hey Morganh,

Thank you for your help and info. I got it to work with a reboot of my VM instance + a fresh pull of the docker container. I had previously installed a bunch of other packages within the docker container (by running apt-get update/upgrade to mount gcs buckets and stuff) and that likely mucked up something.

-Anusha

Topic		Replies	Views
TLT V2.0 Classification TAO Toolkit	26	2787	August 3, 2021
Error while executing "tlt-export" command inside docker "tlt-streamanalytics: v3.0-dp-py3" TAO Toolkit	6	905	October 12, 2021
Problem about installing TLT TAO Toolkit	9	1306	October 12, 2021
Docker run error - "exec format error" TAO Toolkit	17	2114	October 5, 2021
Error at exporting to TRT engine in TLT TAO Toolkit	7	928	August 24, 2021
Not able to deploy .etlt file in deepstream test app 1 TAO Toolkit	12	1819	October 12, 2021
Tlt-convert on jetson nano TAO Toolkit	6	1848	October 12, 2021
Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0 TAO Toolkit deepstream	43	1082	August 18, 2023
Unable to export hdf5 to etlt after Tao Training on Colab TAO Toolkit yolo	11	708	March 21, 2024
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) TAO Toolkit tensorrt	2	1053	October 12, 2021

tlt-export error

Related topics