improving detection performance of SSD mobilenet v2 on Jetson Nano

Hi there,

I have been trying to use SSD mobilenet v2 on jeston nano by using the python branch of jetson inference:

$ git clone -b python jetson-inference-python

During evaluation I noticed that detection performance of this detection engine is not as good as the performance detection of ssd_mobilenet_v2_coco_2018_03_29 which I downloaded from tensorflow models and evaluated on PC using openCV:

$ wget

evaluation of model on PC using OpenCV:

model = cv2.dnn.readNetFromTensorflow('models/frozen_inference_graph.pb',

The application of this detection system for detecting people in security gate where accuarte people count is impotant. I use a USB camera mounted on an arch pointing down towards people passing through gate with a rough angle of 45 degrees, the following link the geometry of people and camera. The camera and it’s feild of view are shown in red:

The failure scenario is when two people are walking one after the othe and the second person in behind of first person is partially obsecured by first person in front. When I am using tensorflow model and OpenCV on PC, it detects two people successfully and on inference TRT FP16 model on jetson nano in many occasion fails to detect the 2nd person.

I have a few questions:

  • Is the TRT model for SSD mobilenet v2 the conversion of tensorflow model ssd_mobilenet_v2_coco_2018_03_29 to UFF FP16 or other model has been converted?
  • If the UFF model is not the conversion of the same tensorflow model, then how can I convert the above tensorflow model to UFF? is there a set of instruction
  • If they are the same model, then it seems to be compromise of accuracy vs speed, could I choose different conversion so that accuracy is increased? Maybe something other than FP16 or some other variation
  • If the models are the same, the other option which I was thinking is to enhace the trained model by additional training on images like this and then convert the enhanced model to TRT FP16, with hope of better detection performance. Is there any ready framework by NVIDIA to achieve the goal of additional training and then how could I convert it to FP16 TRT model?

Please advise.

Many thanks in advance

Hi am_merati, the model was converted using TRT_object_detection repo. Then I use the UFF from that.

What performance of the network are you getting on Nano? I get ~20FPS.

Note that I updated the python branch of jetson-inference last week to include more granular timing info, so you may want to grab that update.

Also you’ll want to run ‘sudo jetson_clocks’ if you are using the detectnet-console program for timing, as the first run will be slow without it because the clocks need to spin up (jetson_clocks script maximizes the clocks). Or you can run detectnet-camera program which processes many frames.

Thanks Dustin for your reply,

I am getting 13-15 FPS depending on resolution of USB camera. This FPS performance is enough based on current evaluation and works fine with the rest of system.

However, detection accuracy is not good enough. When I was running the SSD v2 from tensorflow model zoo (ssd_mobilenet_v2_coco_2018_03_29) on PC using OpenCV, it was detecting all the people even if they are partially occluded orobscured behind the person in front of them.

However, as I mentioned the converted model seems to have less accurate detection and not detecting all people specially those partially occluded or obscured.

I just wanted to make sure that the converted model is obtained from the the model I did originally my evaluation (ssd_mobilenet_v2_coco_2018_03_29.tar.gz from tensorflow models zoo).

If I retrain this tensorflow model with more images from the same camera view angle and the convert the model into TRT UFF format, I may be able to increase the detection accuracy similar to original tensorflow model or even better.

it seems that this part of does the conversion?:

# compile model into TensorRT
if not os.path.isfile(model.TRTbin):
    dynamic_graph = model.add_plugin(gs.DynamicGraph(model.path))
    uff_model = uff.from_tensorflow(dynamic_graph.as_graph_def(), model.output_name, output_filename='tmp.uff')

    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
        builder.max_workspace_size = 1 << 28
        builder.max_batch_size = 1
        builder.fp16_mode = True

        parser.register_input('Input', model.dims)
        parser.parse('tmp.uff', network)
        engine = builder.build_cuda_engine(network)

        buf = engine.serialize()
        with open(model.TRTbin, 'wb') as f:


It might be that the threshold of the detectNet class is set higher than what OpenCV is using. The default threshold for detectNet is 0.5, which you can change (lower threshold will output more objects). You could also try the TRT_object_detection sample to see if that output is different.

Just a note, USB cameras can have a lower effective framerate than MIPI CSI cameras.

Thanks Dustin, I shall try this and maybe enhance the training.


Have you found why you didn’t get the same results?

Did you need to enhance the training?

Best regards



The default threshold for detectNet is 0.5, which you can change (lower threshold will output more objects). You could also try the TRT_object_detection sample to see if that output is different.

We experience exactly the same issue.

The acuracy of the models on Jetson Nano is really lower than the same model on PC.

You can test for exemple with this video :

The difference is huge. We chane the threshold.

Have you any idea why we find difference?

Have you convert ssd mobilenet v2 to uff?