OUTPUT_NAME for facenet-120 for getting bounding boxes as output

I am using Jetson Nano with Jetpack 4.3. How to get bounding box coordinates as output for facenet-120 model? I used both “bboxes” and “coverage”.

For “bboxes”, I got a multidimensional array with 3136 elements
For “coverage”, I got a multidimensional array with 784 elements
and both are not close to what I have expected.
I modified caffe_resnet50.py which is provided as part of tensorrt python examples to use facenet-120 model.

Is it possible to generate 128 point embedding with facenet-120 or facenet-120 is only for detecting faces?

If possible what output layer should I tap for embedding output??

Hi @rajesh0therascal, the jetson-inference library interprets the output of facenet model to reject/cluster the bounding boxes and provide the coordinates. You can use the detectNet class to run it.

If you are interest in the actual code the performs the post-processing, see here:

1 Like

can the facenet-120 get 128D vector face feartures?If the answer is yes ,How get the 128D vector.

I don’t believe so, it doesn’t explicitly extract facial landmarks. It is using the generalized detectnet architecture that can be trained to detect any objects, not just faces.

If you dig into the layers, you may find that the output of the encoder portion of the network could be used (although this data would be abstract and not facial features). I’m not sure which layer this would be however.

Hi @dusty_nv ,

Thanks for your help.

Can you please guide me in understanding the following functions/variables (or provide me reference)

  1. Function definitions for GetNumClasses(), GetInputWidth(), GetInputHeight(), DIMS_W(), DIMS_H(). I couldn’t get from where these values are getting pulled.

I tried understanding detectNet.cpp with facenet-120 as network and got following results for clusterDetections() for caffe model in detectNet.cpp

Total values in net_cvg = 784 [1, 28, 28 ]
Total values in net_rects = 3136 [4, 28, 28]
ow = 28
oh = 28
owh = 784
cell_width = GetInputWidth()/ow # 450/28
cell_height = GetInputHeight()/oh #450/28
I’m using csi camera with 1280 x 720
scale_x = width/GetInputWidth() #1280/450
scale_y = height/GetInputHeight() #720/450

But I’m not sure whether these values are correct or not. My bounding boxes with detectnet.py are working correctly. But when I generated engine with caffeparser.py, I am not able to replicate the same. I am getting slightly smaller than the ones generated by detectNet.py.

Should I add some value to bbox to normalize?

Thanks in advance.

These functions are defined in the detectNet class and the tensorNet base class:

I would recommend just to use the Python API for jetson-inference rather than trying to replicate the pre/post-processing.