Run custom Resnet SSD using TensorRT

Hi experts,

I am trying to run a face detector, using a an SSD model with a Resnet base. The model was originally trained using caffe and outputs four co-ordinates in the image around a face if it is identified. The models I am using are found here:

I have been looking at the python samples for the and introductory_parser_samples and uff_ssd and have managed to parse the caffe model without any errors after editing the prototxt files. Should I follow the uff_ssd python sample to try to use my caffe SSD? Is this a simple task of even possible?


Here is a sample for your reference:

It has a sample for inception SSD.

Hi @AastaLL,

I am looking to run this model on my jetson nano:

I have tryed running it on the Jetson Nano using OpenCV and using ssd-caffe (see here In both cases, it runs at under 1 FPS !!! This is 3 times slower than on my macbook pro without a GPU. I have also tried using the tensorRT facenet example however the accuracy is not at what I need.

How should I go about running this model? Should I build and train it from scratch in tensorRT? Is that possible? Can I parse a caffe ssd Resnet10 model into tensorRT. Do you think this will make it run faster ?

Also, I am using Caffe ssd, the link you sent points to a tensorflow example.

I shared my full implementation of TensorRT optimized MTCNN face detector on GitHub:

When running the ‘’ on Jetson Nano, I can get 7~8 FPS for images with only 1 face (or ~5 FPS for images with multiple faces). Feel free to give it a try.

1 Like

Thanks @jkjung,

I got it running at approx. 4 FPS in the end, I will try out your model also as the frame rate you mentioned is a lot higher.

Although there is one issue. The last ‘detection_out’ layer of the caffe model is supposed to output a N x 7 numpy array. Where N represents the number of possible detections in an image, for each detection there are 7 values including four location points, a confidence level etc. Tensorrt instead outputs a 1D array of size 1400. I resize this to 200 x 7. The 200 represents the maximum number of detections the model can output. Even if there is only one face in the image, the tensorrt engine will still produce the 1400 sized array with the first 7 values containing the information surrounding the detection and the rest of the array comprises of zeros. I can work around this however it would be nice if the output was the same as caffe, i.e only outputs the values of detections

Currently, TensorRT can only work with fix-sized tensors/blobs in all layers, including all inputs and all outputs. I think you have to accept this limitation for now.