TAO faster_rcnn not working

Please provide the following information when requesting support.

• Hardware (NVIDIA Quadro)
• Network Type (Faster_rcnn)
• TAO Version
Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.21.11
published_date: 11/08/2021

Variables Assignment:

import os

print(“Please replace the variables with your own.”)
%env GPU_INDEX=0
%env KEY=tlt

Please define this local project directory that needs to be mapped to the TAO docker session.

%env LOCAL_PROJECT_DIR=/home/umair/tao-experiments
os.environ[“LOCAL_DATA_DIR”] = os.path.join(
os.getenv(“LOCAL_PROJECT_DIR”, os.getcwd()),
“data”
)
os.environ[“LOCAL_EXPERIMENT_DIR”] = os.path.join(
os.getenv(“LOCAL_PROJECT_DIR”, os.getcwd()),
“faster_rcnn”
)
%env USER_EXPERIMENT_DIR=/workspace/tao-experiments/faster_rcnn
%env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data

The sample spec files are present in the same path as the downloaded samples.

Set this path if you don’t run the notebook from the samples directory.

%env NOTEBOOK_ROOT=~/tao-samples/faster_rcnn

os.environ[“LOCAL_SPECS_DIR”] = os.path.join(
os.getenv(“NOTEBOOK_ROOT”, os.getcwd()),
“specs”
)
%env SPECS_DIR=/workspace/tao-experiments/faster_rcnn/specs

Showing list of specification files.

!ls -rlt $LOCAL_SPECS_DIR

tao mount output:

{
“Mounts”: [
{
“source”: “/home/umair/tao-experiments”,
“destination”: “/workspace/tao-experiments”
},
{
“source”: “/home/umair/cv_samples/faster_rcnn/specs”,
“destination”: “/workspace/tao-experiments/faster_rcnn/specs”
}
],
“Envs”: [
{
“variable”: “CUDA_VISIBLE_DEVICES”,
“value”: “0”
}
]
}

The Problem:
Every things seems to be working fine, but when i try to use tao faster_rcnn convert script to generate tf records, it generates nothing. I am frustated, since there is no error but again no output as well. I checked the paths several times but no luck. Any help in this regards?
my labels are in following format

plastic_bag 0 0 0.0 106.0 98.0 353.0 0 0 0 0 0 0 0 0
plastic_bag 0 0 23.0 360.0 163.0 478.0 0 0 0 0 0 0 0 0

Please follow Data Annotation Format — TAO Toolkit 3.21.11 documentation.
The example is as below.

car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00
pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Hi Morganh,
Thank you for your reply and correction. I am using the following format now but still getting no output:

plastic_bag 0 0 0 252.0 290.0 411.0 431.0 0 0 0 0 0 0 0
plastic_bag 0 0 0 389.0 231.0 481.0 354.0 0 0 0 0 0 0 0
plastic_bag 0 0 0 442.0 328.0 580.0 431.0 0 0 0 0 0 0 0

There is no error in the output, rather it just do nothing and no tfrecord is generated.

Could you please follow Data Annotation Format — TAO Toolkit 3.21.11 documentation and set some fields to float?

plastic_bag 0.00 0 0.00 210.00 262.00 380.00 439.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
plastic_bag 0.00 0 0.00 5.00 149.00 233.00 414.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Using this format now, still same problem

Could you share the command and full log? Thanks.

command
!tao faster_rcnn dataset_convert --gpu_index $GPU_INDEX -d $SPECS_DIR/frcnn_tfrecords_kitti_trainval.txt
-o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

Output
2022-02-07 14:02:59,978 [INFO] root: Registry: [‘nvcr.io’]
2022-02-07 14:03:00,057 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2022-02-07 14:03:00,078 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/umair/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2022-02-07 14:03:00,861 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Please share the result of below command.

! tao faster_rcnn run cat $SPECS_DIR/frcnn_tfrecords_kitti_trainval.txt

2022-02-07 14:20:57,474 [INFO] root: Registry: [‘nvcr.io’]
2022-02-07 14:20:57,545 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2022-02-07 14:20:57,564 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/umair/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
kitti_config {
root_directory_path: “/workspace/tao-experiments/data/training”
image_dir_name: “image_2”
label_dir_name: “label_2”
image_extension: “.jpg”
partition_mode: “random”
num_partitions: 2
val_split: 14
num_shards: 10
}
image_directory_path: “/workspace/tao-experiments/data/training”
2022-02-07 14:20:58,060 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Please try with below again.
!tao faster_rcnn dataset_convert -d $SPECS_DIR/frcnn_tfrecords_kitti_trainval.txt
-o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

same

2022-02-07 14:35:54,598 [INFO] root: Registry: [‘nvcr.io’]
2022-02-07 14:35:54,675 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2022-02-07 14:35:54,693 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/umair/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2022-02-07 14:35:55,510 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

How about
! tao faster_rcnn run ls $DATA_DOWNLOAD_DIR

2022-02-07 15:07:59,219 [INFO] root: Registry: [‘nvcr.io’]
2022-02-07 15:07:59,295 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2022-02-07 15:07:59,315 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/umair/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
testing tfrecords training
2022-02-07 15:07:59,803 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I suggest you to debug inside the docker. Please run in terminal instead of notebook.

$ tao faster_rcnn run /bin/bash

then inside docker,

# faster_rcnn dataset_convert  -d  <explicit_path_of frcnn_tfrecords_kitti_trainval.txt> -o xxx

I am running this command

faster_rcnn dataset_convert -d /workspace/tao-experiments/faster_rcnn/specs/frcnn_tfrecords_kitti_trainval.txt -o /workspace/kitti

and getting

Illegal instruction (core dumped)

May I know which cpu in your PC?
This error usually results from old kind of CPU. You can also search this error and find similar topics in this forum.

That is unlikely, since I am working on a server virtual machine and same hardware is being used by others successfully for training tao.

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 40 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 15
Model: 6
Model name: Common KVM processor
Stepping: 1
CPU MHz: 2799.998
BogoMIPS: 5599.99
Hypervisor vendor: KVM
Virtualisation type: full
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 8 MiB
L3 cache: 16 MiB

Please check the cpu info according to Why cv task cannot work with NVIDIA TAO Toolkit 3.0 - #8 by 16968377 . Seems that that user also uses KVM.

Thank you Morganh, it was actually related to avx instructions on the KVM. Thanks for pointing out the exact thread.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.