tlt-export error

Hi,

I am trying to export one of my trained models to etlt for deployment with deepstream. I’m running “tlt-export -h” and I’m seeing:

Using TensorFlow backend.
Traceback (most recent call last):
File “/usr/local/bin/tlt-export”, line 6, in
from iva.common.magnet_export import main
File “./common/magnet_export.py”, line 23, in
File “./modulus/export/_tensorrt.py”, line 26, in
File “/usr/local/lib/python2.7/dist-packages/pycuda/autoinit.py”, line 9, in
context = make_default_context()
File “/usr/local/lib/python2.7/dist-packages/pycuda/tools.py”, line 204, in make_default_context
“on any of the %d detected devices” % ndevices)
RuntimeError: make_default_context() wasn’t able to create a context on any of the 1 detected devices

Below are few other details:

  1. docker: docker run --privileged=true --runtime=nvidia -itd --rm -v /home/anusha_k/tlt:/workspace/tlt -p 8000:8000 nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2

  2. nvcc -V
    nvcc: NVIDIA ® Cuda compiler driver
    Copyright © 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130

  3. nvidia-smi
    Mon Dec 16 23:54:29 2019
    ±----------------------------------------------------------------------------+
    | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
    |-------------------------------±---------------------±---------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    |===============================+======================+======================|
    | 0 Tesla V100-SXM2… Off | 00000000:00:04.0 Off | 0 |
    | N/A 49C P0 55W / 300W | 15881MiB / 16130MiB | 64% Default |
    ±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
±----------------------------------------------------------------------------+

  1. I can run all other tlt commands just fine. For example,
    tlt-dataset-convert -h
    Using TensorFlow backend.
    usage: dataset_converter [-h] -d DATASET_EXPORT_SPEC -o OUTPUT_FILENAME
    [-f VALIDATION_FOLD] [-v]

Convert object detection datasets to TFRecords

optional arguments:
-h, --help show this help message and exit
-d DATASET_EXPORT_SPEC, --dataset-export-spec DATASET_EXPORT_SPEC
Path to the detection dataset spec containing config
for exporting .tfrecords.
-o OUTPUT_FILENAME, --output-filename OUTPUT_FILENAME
Output file name.
-f VALIDATION_FOLD, --validation-fold VALIDATION_FOLD
Indicate the validation fold in 0-based indexing. This
is required when modifying the training set but
otherwise optional.
-v, --verbose Flag to get detailed logs during the conversion
process.

  1. On a side note, I see that the recommended OS is Ubuntu 18.04 (https://docs.nvidia.com/metropolis/TLT/tlt-release-notes/#magnet-release-notes), but the docker containers (nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2 and v1.0.1_py2) are based off of 16.04? Wondering why?

cat /etc/os-release
NAME=“Ubuntu”
VERSION=“16.04.6 LTS (Xenial Xerus)”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 16.04.6 LTS”
VERSION_ID=“16.04”
HOME_URL=“http://www.ubuntu.com/
SUPPORT_URL=“http://help.ubuntu.com/
BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Manually updating to 18.04 doesn’t solve this issue, though.

-Anusha

Hi Anusha,
Can you confirm if all below commands work for “-h”?

root@ec31b0b6631f:/workspace# ll /usr/local/bin/tlt*
-rwxr-xr-x 1 root root 241 Sep 13 21:15 /usr/local/bin/tlt-dataset-convert*
-rwxr-xr-x 1 root root 227 Sep 13 21:15 /usr/local/bin/tlt-evaluate*
-rwxr-xr-x 1 root root 225 Sep 13 21:15 /usr/local/bin/tlt-export*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-infer*
-rwxr-xr-x 1 root root 229 Sep 13 21:15 /usr/local/bin/tlt-int8-tensorfile*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-prune*
-rwxr-xr-x 1 root root 215 Sep 13 21:15 /usr/local/bin/tlt-pull*
-rwxr-xr-x 1 root root 736 Aug 27 21:09 /usr/local/bin/tlt-train*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-train-g1*

More, below requirement is for your host PC instead of TLT container.

<b>Software Requirements</b>
Ubuntu 18.04 LTS
NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/
nvidia-docker2 installed, instructions: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
NVIDIA GPU driver v410.xx or above

Hey Morganh,

Thank you for your help and info. I got it to work with a reboot of my VM instance + a fresh pull of the docker container. I had previously installed a bunch of other packages within the docker container (by running apt-get update/upgrade to mount gcs buckets and stuff) and that likely mucked up something.

-Anusha