Tao ssd export

I trained an SSD ResNet10 model using the “ssd.ipynb” notebook on Google Colab and obtained a file named “ssd_resnet10_epoch_010.tlt”. However, I encountered some issues when attempting to execute the “!tao ssd export” command in the same notebook.

General information:

• Google Colab
• Hardware T4
• SSD resnet10
• Training spec file
ssd_retrain_resnet10_kitti.txt (1.9 KB)

Command:

!tao ssd export -m $EXPERIMENT_DIR/experiment_dir_retrain/1280/weights/ssd_resnet10_epoch_010.tlt \
                         -o $EXPERIMENT_DIR/experiment_dir_etlt/1280/ssd_resnet10_epoch_10_fp32.etlt \
                         -e $SPECS_DIR/1280/ssd_retrain_resnet10_kitti.txt \
                         -k $KEY \
                         --data_type fp32 \
                         --gen_ds_config

Error:

2023-06-05 23:09:38.176465: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-05 23:09:38.176725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-05 23:09:38.176905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13642 MB memory) → physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Traceback (most recent call last):
File “</usr/local/lib/python3.6/dist-packages/iva/ssd/scripts/export.py>”, line 3, in
File “”, line 17, in
File “”, line 302, in launch_export
File “”, line 284, in run_export
File “”, line 372, in export
File “”, line 141, in save_etlt_file
File “”, line 218, in node_process
File “/usr/local/lib/python3.6/dist-packages/graphsurgeon/DynamicGraph.py”, line 330, in remove
remove_names = set(_get_node_names(nodes))
File “/usr/local/lib/python3.6/dist-packages/graphsurgeon/_utils.py”, line 85, in _get_node_names
return [node.name for node in nodes]
File “/usr/local/lib/python3.6/dist-packages/graphsurgeon/_utils.py”, line 85, in
return [node.name for node in nodes]
AttributeError: ‘str’ object has no attribute ‘name’
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL

Previously I have trainned other models with yolo_v4.ipynb and yolo_v4_tiny.ipynb notebooks and everythings works as expected when exporting to .etlt

To narrow down, could you login the tao docker and try again?
docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash

Then, inside the docker
# ssd export -m xxx -o xxx -e xxx -k xxx

Thank you very much for your response!

I am running everything in google colab with the configuration of “ssd.ipynb” notebook that we can find in:
nvidia-tao/tensorflow/ssd at main · NVIDIA-AI-IOT/nvidia-tao · GitHub”.

I have not used docker because it is metioned in next link that it is not supported for google colab: “Install Docker in Google Colab! · GitHub

And I also tried and got errors like:
“docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?. See ‘docker run --help’.”

I am running it in google colab because I do not have a local GPU.

OK. And could you run tao export against the your existing training result(the tlt model in you train folder instead of retrain folder) to check if it works?

I have tried as you suggest but I got the same error

!tao ssd export -m $EXPERIMENT_DIR/experiment_dir_train/1280/weights/ssd_resnet10_epoch_010.tlt \
                         -o $EXPERIMENT_DIR/experiment_dir_etlt/1280/ssd_resnet10_epoch_10_fp32.etlt \
                         -e $SPECS_DIR/1280/ssd_train_resnet10_kitti.txt \
                         -k $KEY \
                         --data_type fp32 \
                         --gen_ds_config

Could you please add below parameter in the tao ssd export command line and retry? Thanks.

--target_opset 12

More, may I know the tao version you are using? Can you share the result of $ tao info --verbose

I have tested adding --target_opset 12 but it show always the same error. I have also tested running tao info --verbose but shows next message

!tao info --verbose

usage: tao [-h]
{action_recognition,augment,bpnet,classification_tf1,classification_tf2,converter,deformable_detr,detectnet_v2,dssd,efficientdet_tf1,efficientdet_tf2,emotionnet,faster_rcnn,fpenet,gazenet,gesturenet,heartratenet,intent_slot_classification,lprnet,mask_rcnn,multitask_classification,n_gram,pointpillars,pose_classification,punctuation_and_capitalization,question_answering,re_identification,retinanet,segformer,spectro_gen,speech_to_text,speech_to_text_citrinet,speech_to_text_conformer,ssd,text_classification,token_classification,unet,vocoder,yolo_v3,yolo_v4,yolo_v4_tiny}

tao: error: invalid choice: ‘info’ (choose from ‘action_recognition’, ‘augment’, ‘bpnet’, ‘classification_tf1’, ‘classification_tf2’, ‘converter’, ‘deformable_detr’, ‘detectnet_v2’, ‘dssd’, ‘efficientdet_tf1’, ‘efficientdet_tf2’, ‘emotionnet’, ‘faster_rcnn’, ‘fpenet’, ‘gazenet’, ‘gesturenet’, ‘heartratenet’, ‘intent_slot_classification’, ‘lprnet’, ‘mask_rcnn’, ‘multitask_classification’, ‘n_gram’, ‘pointpillars’, ‘pose_classification’, ‘punctuation_and_capitalization’, ‘question_answering’, ‘re_identification’, ‘retinanet’, ‘segformer’, ‘spectro_gen’, ‘speech_to_text’, ‘speech_to_text_citrinet’, ‘speech_to_text_conformer’, ‘ssd’, ‘text_classification’, ‘token_classification’, ‘unet’, ‘vocoder’, ‘yolo_v3’, ‘yolo_v4’, ‘yolo_v4_tiny’)

However, I found the way of running the docker containner mentioned on the first reply in another machine with GPU and inside the docker the export commands works as expected, without any problem. It seems that the problem is only in google colab.

OK, please use the this way as the workaround.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.