What almost everyone with a nano is looking for

This post is mostly directed to NVIDIA, but please feel free to +1 it…

What we are looking for is:

  1. ssd-mobilenet or tiny-yolo model, easily callable from python.
  2. reasonable loading time (3-6 minutes is not reasonable!)
  3. FAST! we want it to perform over 20 fps, so it can process webcam or two.

Please NVIDIA, provide us with a working example of the above!

No, TF-TRT example is not that - is is slow to load and speed is marginal
No, the uff_ssd is not that - it is very specific to ssd_inception
No, the sampleUffSSD_rect is not that - it is in CPP

Hi,

In case you don’t know, have you maximized the Nano performance first?

sudo nvpmodel -m 0
sudo jetson_clocks.sh

We are discussing internally if we can extract a pure TensorRT python example for ssd-mobilenet.
Will update more information with you later.

Thanks.

Hi,

Yes, I am aware of the 10w mode. still, other then the cpp sample, nothing is fast enough to really serve as a useful object detection.

Thank you for doing this, I am sure it will be much appreciated by everyone, not just me…

+1 to any Pure RT example.

+1

This should only occur the first time you load or convert a model with TensorRT, not every time. On subsequent loads it should be pretty much instant. And it can be done on any Nano in advance of deploying your application. You can also tweak the TensorRT parameters so that it spends less iterations/time doing micro architecture tuning on finding the best kernels.

Aasta is going to look into making a Python sample for TensorRT using SSD-Mobilenet-v2, but in the meantime feel free to use the TensorRT Python API to load your model. I have also been working to add Python support to Hello AI World (jetson-inference).

The long loading time referred to the tf-trt loading. I have this in another thread. Even after loading a frozen trt model it still takes minutes to parse it. Might be my setup, would love if you could check this out.
Currently the tf-trt api are the only “usable” python api we have. Although not as fast as native trt its close enough. However, the time it currently takes to load the model is prohibiting. If you can’t find the thread let me know. On the phone now so awkward to find.

There is also the TensorRT Python API, it doesn’t depend on TensorFlow at runtime and shouldn’t need extended load times every time. Please refer to the documentation here:

https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/index.html#python

+1

so, I am trying this. I am trying to load the simplest object detection module from the tf zoo, http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz

following the doc, I have successfully converted the frozen graph to UFF, so far so good althoug i got heaps of messages about non implemented ops that are going to bite me later, I am sure.

still following the doc, I tried importing the graph using the code:

with builder = trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
    	parser.register_input("Placeholder", (1, 28, 28))
    	parser.register_output("fc2/Relu")
parser.parse(model_file, network)

which is, just saying, a syntax error (“builder =” inside a with is not proper syntax), but lets ignore that for a moment.

so my code looks like:

import tensorrt as trt
import time

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
model_file = 'frozen_inference_graph.uff'
print("#1", time.time())
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
        print("#2", time.time())
        parser.register_input("Placeholder", (1, 28, 28))
        print("#3", time.time())
        parser.register_output("fc2/Relu")
        print("#4", time.time())
        parser.parse(model_file, network)
        print("#5", time.time())

it fails with:
[TensorRT] ERROR: UFFParser: Graph error: Cycle graph detected

Ah i think i know what the problem is, the placeholders are wrong. Had to leave but will check later

You might also find this GitHub repo about TensorFlow UFF useful:

https://github.com/NVIDIA-AI-IOT/tf_to_trt_image_classification

Alas, the register_output can only take one argument. How do I tell it the

[‘detection_boxes’, ‘detection_classes’, ‘detection_scores’, ‘num_detections’]

which is the output of the model? do I register each one?

the sample is for classification so it doesn’t help in this case.

ok, registered each one according to https://devtalk.nvidia.com/default/topic/1026777/how-to-convert-tensorflow-model-with-mutiple-output-to-uff/

and we are back to “[TensorRT] ERROR: UFFParser: Graph error: Cycle graph detected”

code now is:

import tensorrt as trt
import time

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
model_file = 'frozen_inference_graph.uff'
input_names = 'image_tensor'
output_names = ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections']


print("#1", time.time())
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
        print("#2", time.time())
        parser.register_input(input_names, (224,224,1))
        print("#3", time.time())
        for output_ in output_names:
            parser.register_output(output_)
        print("#4", time.time())
        parser.parse(model_file, network)
        print("#5", time.time())

looked at the repository. it uses a cpp program to do the parsing and can’t be easily converted to multiple outputs. code looks the same as what I did so I suspect it would fail with same message.

It should be super simple for you to reproduce my steps:

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz
tar xvf ssd_mobilenet_v1_coco_2018_01_28.tar.gz
python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb
mv ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.uff .
python3 test.py #this is my code

have also changed the convert_to_uff to:

python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py --input_node=image_tensor --output_node=detection_boxes --output_node=detection_classes --output_node=detection_scores --output_node=num_detections ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb

but no cigar, same cycle graph detected error

ummm anyone can help me with this or shall I just wait for Aasta to finish ssd mobilenet?

Hi,

[TensorRT] ERROR: UFFParser: Graph error: Cycle graph detected

This error occurs when converting a model with an unsupported operation “_Switch”.
It looks like there is no corresponding plugin implementation for the layer and leads to errors.

A python sample for a fully supported model is simple.
But most of the detection model contains a non-supported layer, which makes the implementation becomes complex.
This is because most of the models insert a condition-type layer to extract the bbox.

Currently, you can try some fully supported layer which should work fine with the code in #16.
Here is our support matrix for your reference: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#support_op

Thanks.

Thank you, Aasta, I suspected that much. But, as I was referred to this by dusty I thought I might have just made a fool of myself and just not followed up with the docs.

Just as a side note - mobilenet v1 has been around for so long it is a bit discouraging that tensorrt does not support its layers. it seems to be stuck in the era of alexnet.

I understand from dusty that you are working on implementing it? This is important as I am about ready to give up on the nano. I am in the middle of writing a Medium article on my experience. If no one is working on implementing more layers, I’ll just go ahead and publish. If you think this will happen in the next two weeks, i’ll wait.

As I mentioned in my original email… Nano does not provide what most people are looking for when they buy it and look at the benchmark.

On the one hand, it does not support opencl so external accelerators can’t work on it. On the other hand, its implementations of tensorflow and pytorch are much slower then what you would expect from the benchmarks and they are monstrous products that hogs memory.

@moshe Don’t give up too soon. These are early days for the Nano, and it seems to me (as an outsider) that NVIDIA are committed to improving software support for the product as fast as they can.

I imagine that all of the vendor marketeers in this space (low-cost ANN platforms) are seeking first mover advantage, and the tech support teams are running as fast as they can behind them.

For me, the killer features of the Nano are

acceptable price-performance
low power consumption, size and weight, and (crucially)
on-board trainability, not just inference.

IMO they makes the product suitable for classroom use and hobbyist experimentation. Commercial use involves a whole different set of factors which I am not qualified to judge.