Trying to use my Dataset

Hi,
I’m trying to test my Dataset, I had converted it in Kitti format but when I run the notebook: detectnet_v2.ipynb
It shows the following error when running the command (point B of the notebook):
“# Creating a new directory for the output tfrecords dump.
print(“Converting Tfrecords for kitti trainval dataset”)
!tlt-dataset-convert -d tlt_specs/detectnet_v2_tfrecords_kitti_trainval.txt
-o train/tfrecords/kitti_trainval/kitti_trainval”

it returns:
"
Converting Tfrecords for kitti trainval dataset
2021-04-08 20:51:20.406096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Using TensorFlow backend.
Traceback (most recent call last):
File “/usr/local/bin/tlt-dataset-convert”, line 5, in
from iva.detectnet_v2.scripts.dataset_convert import main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 14, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/build_converter.py”, line 13, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/kitti_converter_lib.py”, line 21, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py”, line 19, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataloader/utilities.py”, line 16, in
File “/usr/local/lib/python3.6/dist-packages/modulus/init.py”, line 8, in
from modulus import blocks
File “/usr/local/lib/python3.6/dist-packages/modulus/blocks/init.py”, line 22, in
from modulus.blocks import data_loaders
File “/usr/local/lib/python3.6/dist-packages/modulus/blocks/data_loaders/init.py”, line 9, in
from modulus.blocks.data_loaders.sqlite_dataloader import SQLiteDataLoader
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/sqlite_dataloader.py”, line 10, in
File “/usr/local/lib/python3.6/dist-packages/modulus/dataloader/init.py”, line 9, in
from modulus.dataloader import humanloop
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/dataloader/humanloop.py”, line 16, in
File “/usr/local/lib/python3.6/dist-packages/modulus/processors/init.py”, line 26, in
from modulus.processors.buffers import NamedTupleStagingArea
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/processors/buffers.py”, line 11, in
File “/usr/local/lib/python3.6/dist-packages/modulus/hooks/init.py”, line 9, in
from modulus.hooks.hooks import (
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/hooks/hooks.py”, line 25, in
ModuleNotFoundError: No module named ‘nvml’
"
Can someone help me with this issue? I don’t have any idea

Thanks in advance

Which TLT version did you run, TLT 3.0 or TLT 2.0?

I’m not sure about it, but probably is TLT 2.0,
this is my pull request: “docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3

and the is how I ran it: " ```
docker run --gpus all -it -v “/path/to/dir/on/host”:“/path/to/dir/in/docker”
-p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash

So, it should be TLT 2.0 because of nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3

Please add --runtime=nvidia when you login the TLT 2.0 docker.
$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics: /bin/bash

If still not work, please check the first field of a label file should be string instead of int.
See similar topic: ImportError: No module named nvml - #6 by Morganh

the command --runtime=nvidia… doesn’t work:
“docker: invalid reference format.”

I have checked the labels but didn’t see any type (String nor int) in mine (and Kaggle)

Please paste the full command and full log.

I have changed machine and everything works fine! thanks for the support.
Now I have another problem, when I run the Jupiter notebook, at point 3 “Run TLT training”:
“!tlt-train detectnet_v2 -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
-n resnet18_detector
–gpus $NUM_GPUS”

the result is
Using TensorFlow backend.
/usr/local/bin/tlt-train: line 32: 916 Illegal instruction (core dumped) tlt-train-g1 ${PYTHON_ARGS[*]}

do you know why or I have to create another discussion?

It is related to CPU. See Illegal Instruction When Running the TLT from Search results for ' Illegal instruction #intelligent-video-analytics:transfer-learning-toolkit order:latest_topic' - NVIDIA Developer Forums