Questions about Face-Recongnition


We like to rebuild a face recongnition dataset.
Is there any plan to release the document of how to rebuild a face recongnition dataset and model in the future?
By the way, which version DIGITS would work with the face recongnition prototxt?

[url][/url] Tx2 TR2.1

Thank you,

Have you tried out FDDB yet? (Facial Detection DataBase)

Perhaps Aasta can comment further here about what she used to train the network.

There is a lot of good information that Aasta posted about the face recognition dataset/model in this thread.

That particular face recognition model is actually two models combined into one. You have to use Digits to build a face detection model (DetectNet), and then build a face recognition model (GoogleNet). After you get done building the two models you go through a process to merge the models. The details are in that thread.

I don’t believe Digits can be used for Inference on the combined Model. But, it can be used to build the components and Caffe (a part of Digits) does the merging.

The entire process is a bit involved, and I don’t have know if the datasets are available in a manner that’s ready to use for Digits.

I took a brief look at the two datasets and while they were publically available they weren’t in the format needed for Digits. But, with some scripting could likely be easily done. The face recognition dataset of that model can’t be shared directly due to copyright issues. So that datasets just has a huge number of links to celebrity faces on the web.


We train recognition network with VGG_Face.
VGG_Face is an extensive database containing 2,622 identities, and each identity has 1000 images.

We train this model with DIGITs since it is a traditional classification problem.
Input: a face image, we crop face region with OpenCV API (in training stage).
Output: label ID, each ID stands for a person in database

As mentioned, the input of recognition network is the face region, that why we also combine detection network in Face-Recognition sample.

Thanks Dusty_nv, S4WRXTTCS, AastaLLL for your prompt reply.
Those are great helps.

I appreciate AastaLLL shared the steps of implementation details. really helps a lot.

I’m having issues with #5.
The error is the same as S4WRXTTCS got.

F0908 17:38:53.649592 18887 gpu_memory.cpp:74] Check failed: initialized_ Create GPUMemory::Scope to initialize Memory Manager
*** Check failure stack trace: ***
Aborted (core dumped)

I might need to re-train the models.

Thank you for any suggestions.


I had this issue with two Digits machines running a standard Digits 5.0 install so I assumed it was something about my Models, prototxt files, etc. There was a lot of variables so I wasn’t sure what it was.

But, for some reason I wasn’t getting that error on the Jetson TX2. Instead it processed through quite a few layers and then error’d out on "Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR.

The fact that the Jetson processed so much more led me to believe something wasn’t right with my Digits install. So I did a fresh manual install of Digits 6.0 RC2 (mostly following the install instructions for Digits in the Jetson-Inference git repository), and I didn’t get any errors with that.

Now I still don’t have the processed model working correctly. I get a segmentation fault when loaded, but AastaLLL uploaded the caffemodels so I should be able to verify that I’m building the merged model correctly.

I’ve concluded that the Create GPUMemory error has nothing to do with the Model, prototxt files. It gives me that error even when using what AastaLLL provided.

With what AastaLLL provided my Digits 6.0RC2 install is able to build the merged Model, and the merged model works correctly with the face recognition app.

I wonder if the GPU Memory error with Digits 5.0 isn’t related to this?

Thank S4WRXTTCS for sharing the experience.
As you said I got the Create GPUMemory error still, as I used the models which AastaLLL provided.

However, after I installed Digits 6.0RC1 and I still got the Create GPUMemory error with the models which AastaLLL provided.
Is there any I missed?

Here are the steps I did.

  1. download Digits 6.0RC1 from Releases · NVIDIA/DIGITS · GitHub.
    there is no Digits 6.0RC2.
    Would you share the Digits 6.0RC2 link?

  2. install Digits 6.0RC1

    $ sudo pip install -r requirements.txt
    $ ./digits-devserver

  3. did step4 to rebuild a caffe model again

  4. run

  5. got the Create GPUMemory error still

Appreciate any suggestions.


We didn’t meet the GPU Memory before.
So we didn’t have too much experience to share with the GPU memory error.

The DIGITs version we used is 6.0.0-rc.1.
We install DIGITs from the master branch directly.

There are some minor procedures you can check first:

More, please remember to create a database with size 1x3x224x224 first.


It seems like between the time that I wrote that and now that Digits 6.0 has been officially released. I believe Digits RC2 was simply released as Digits 6.0.

In any case I don’t believe the Digits version matters as much as the NVCaffe version. As part of the Digits install you install NVCaffe. So just make sure you’re using the latest version of that.

It’s also worth mentioning that the official install of Digits now uses docker. So all the requirements can be contained within the docker. It makes the install really easy, but I don’t know if you can get terminal access within the Digits 6.0 container in order to do this merge.

So it’s probably best to forget about Digits for the merging, and just focus on installing the NVCaffe along with CUDNN 6.0.

Hi AastaLLL,

Thank you.
Yes. We will create a database with size 3224224.


Thank you for your great help.
After I re-installed the NVCaffe along with CUDNN 6.0., I was able to use the detection and classification models to build a merge.caffemodel.

However, I also got poor accuracy after I used my classification model.

Here are the results I got.
Face Recognition, Face Detection → Works correctly
new classification , Face Detection → poor accuracy

I might work again with my classification model.
Thank you for any suggestions.


We now seem to be at the same spot. Where we can build the merged model, but the classification isn’t giving us the accuracy we expect.

What I’m doing now is trying to determine the accuracy I should expect.

I went back to the original merge.caffemodel/deploy.prototxt/label file/etc to see how accurate it was, and it doesn’t seem to be accurate at all.

I didn’t notice before since my face wasn’t in the trained classification model. So I didn’t think much of it, but now I’ve shown the camera pictures of faces within the trained model and it’s not detecting them accurately. The face detection is accurate, but not the face recognition.

Last night I launched a script I found that would download the entire dataset, but that’s going to take at least a day to download everything. There are a lot of missing links thought because the links are at least 2 years old if not more. Even if I can only build a subset (like 54 celebrity faces) then I should be able to determine how accurate it is.

Here is a link on how download/prepare the VGG face dataset.

Hi both,

We are discussing how to make the ‘merge model’ process easier.
Will update information to you later.

For the merged model I think an easier example dataset would be advantageous.

The problem with the vgg-face dataset is it’s not publically available. Due to copyright laws it’s shared by the way of links to celebrity pictures on the web. By now the links are 2+ years old so a lot of them are invalid. There also isn’t any straight forwards way to process the images into a classification dataset. The one I found (that I linked to in this thread) didn’t actually work so I have to debug what’s wrong with it. It doesn’t do the crop function like it’s supposed to.

The other issue with the Classification Model used is it doesn’t seem accurate. I used the face recognition program as is (without changing the models) and it wasn’t able to identify the celebrities correctly with pointing it to an image the dataset was trained on.

So it’s hard to debug an accuracy issue when the example itself isn’t accurate.

For myself the main issues I ran into could be avoided by adding a new notes on the process. I’m not complaining about the lack of documentation as this example was really intended to show how to use plugin layers. Merging models is extremely useful so I’m hoping people take advantage of this example.

For me the obstacles I ran into were

1.) not realizing I had to use the latest version of nvcaffe for the merge script
2.) not realizing the example used a pretrained model, and that’s why the loss3/classifier needed to be renamed.
3.) the tmp layer on the deploy.prototxt example through me off.

I may have made other mistakes that I don’t know about yet.

Hi all,

This time, we merge a dog detection model with an object classification model to demonstrate how to change Face-Recognition sample to your use-case.

In the end, the pipeline can detect dog and output its species.
External Media

Here are the procedures:

1. Prepare a detection model
We use dog detection model contained in jetson_inference.

prototxt: [jetson_inference]/data/networks/DetectNet-COCO-Dog/deploy.prototxt
model: [jetson_inference]/data/networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel

Please copy files to /home/nvidia/

2. Prepare a classification model
We use image classification model contained in jetson_inference.
prototxt: [jetson_inference]/data/networks/googlenet.prototxt
model: [jetson_inference]/data/networks/bvlc_googlenet.caffemodel
class_labels: [jetson_inference]/data/networks/ilsvrc12_synset_words.txt

Please copy files to /home/nvidia/

→ labels.txt
Please noticed that we remove the prefix information manually since the loadClassInfo in Face-Recognition doesn’t support prefix.
If you are looking for the prefix parser, please check jetson_inference sample here.

3. Run this script to rename model

We apply this change for our use-case

diff --git a/script/ b/script/
index fb88913..3a39263 100644
--- a/script/
+++ b/script/
@@ -1,6 +1,6 @@
-deploy_fd = '/home/vyu/Face/JEP/script/detection.prototxt'
-deploy_fr = '/home/vyu/Face/JEP/script/classification.prototxt'
-deploy_merge = '/home/vyu/Face/JEP/script/deploy.prototxt'
+deploy_fd = '/home/nvidia/deploy.prototxt'
+deploy_fr = '/home/nvidia/googlenet.prototxt'
+deploy_merge = '/home/nvidia/step3.prototxt'
 fp1 = open(deploy_fd, 'r')
 fp2 = open(deploy_fr, 'r')
$ python

We call the generated prototxt step3.prototxt

4. Generate a random caffemodel

  • Find a server with DIGITs environment.
DIGITS version:
    Caffe version:
    Caffe flavor:
  • Apply this change to step3.prototxt to get step4.prototxt
diff --git a/step3.prototxt b/step3.prototxt
index cf44385..a10c004 100644
--- a/step3.prototxt
+++ b/step3.prototxt
@@ -2,22 +2,13 @@ input: "data"
 input_shape {
   dim: 1
   dim: 3
-  dim: 640
-  dim: 640
-layer {
-  name: "deploy_transform_fd"
-  type: "Power"
-  bottom: "data_fd"
-  top: "transformed_data_fd"
-  power_param {
-    shift: -127.0
-  }
+  dim: 224
+  dim: 224
 layer {
   name: "conv1/7x7_s2_fd"
   type: "Convolution"
-  bottom: "transformed_data_fd"
+  bottom: "data"
   top: "conv1/7x7_s2_fd"
   param {
     lr_mult: 1.0
@@ -2170,18 +2161,10 @@ layer {
-name: "GoogleNet_fr"
-layer {
-  name: "data_fr"
-  type: "Input"
-  top: "data_fr"
-  input_param { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } }
 layer {
   name: "conv1/7x7_s2_fr"
   type: "Convolution"
-  bottom: "data_fr"
+  bottom: "data"
   top: "conv1/7x7_s2_fr"
   param {
     lr_mult: 1
  • New Image Classification Dataset with image size 224x224

  • New Image Classification Model with

epochs = 1 (will finish faster)
Custom Network → paste steps4.prototxt → Creat
Download model → extract snapshot_iter_[N].caffemodel

In our case, it’s snapshot_iter_3.caffemodel

5. Replace weight
Please run this script to replace random weight to the accurate detection and classification weight.

We apply this change for our use-case

diff --git a/ b/
index ae80321..baa25fa 100644
--- a/
+++ b/
@@ -1,13 +1,13 @@
 import sys 
 import caffe
-deploy_fd = 'detection.prototxt' your use-case
-deploy_fr = 'classification.prototxt'
-deploy_merge = 'deploy.prototxt'
+deploy_fd = 'deploy.prototxt'
+deploy_fr = 'googlenet.prototxt'
+deploy_merge = 'step4.prototxt'
-model_fd = 'detection.caffemodel'
-model_fr = 'classification.caffemodel'
-model_merge = 'snapshot_iter_1.caffemodel'
+model_fd = 'snapshot_iter_38600.caffemodel'
+model_fr = 'bvlc_googlenet.caffemodel'
+model_merge = 'snapshot_iter_3.caffemodel'
 net_fd = caffe.Net(deploy_fd,model_fd, caffe.TEST)
 net_fr = caffe.Net(deploy_fr,model_fr, caffe.TEST)
$ python

→ merge.caffemodel

6. Add plugin

diff --git a/step4.prototxt b/step4.prototxt
index a10c004..344db58 100644
--- a/step4.prototxt
+++ b/step4.prototxt
@@ -2,8 +2,8 @@ input: "data"
 input_shape {
   dim: 1
   dim: 3
-  dim: 224
-  dim: 224
+  dim: 640
+  dim: 640
 layer {
   name: "conv1/7x7_s2_fd"
@@ -2162,9 +2162,32 @@ layer {
 layer {
+  name: "bboxMerge"
+  type: "IPlugin"
+  bottom: "data"
+  bottom: "coverage_fd"
+  bottom: "bboxes_fd"
+  top: "count_fd"
+layer {
+  name: "selectBbox"
+  type: "IPlugin"
+  bottom: "bboxes_fd"
+  bottom: "count_fd"
+  top: "bbox_fr"
+  top: "bbox_id"
+layer {
+  name: "dataRoi"
+  type: "IPlugin"
+  bottom: "data"
+  bottom: "bbox_fr"
+  top: "data_roi"
+layer {
   name: "conv1/7x7_s2_fr"
   type: "Convolution"
-  bottom: "data"
+  bottom: "data_roi"
   top: "conv1/7x7_s2_fr"
   param {
     lr_mult: 1
@@ -4311,3 +4334,13 @@ layer {
   bottom: "loss3/classifier_fr"
   top: "prob_fr"
+layer {
+  name: "summaryLabel"
+  type: "IPlugin"
+  bottom: "bboxes_fd"
+  bottom: "count_fd"
+  bottom: "bbox_id"
+  bottom: "prob_fr"
+  top: "label"

→ deploy.prototxt

7. Run

  • Since this classification output blob name is prob_fr, we apply this change to Face_Recognition source code:
diff --git a/face-recognition/face-recognition.cpp b/face-recognition/face-recognition.cpp
index 33b5dab..8ac5f9f 100644
--- a/face-recognition/face-recognition.cpp
+++ b/face-recognition/face-recognition.cpp
@@ -25,7 +25,7 @@ const char* OUTPUT_BLOB_BOX = "bboxes_fd";
 const char* OUTPUT_BLOB_NUM = "count_fd";
 const char* OUTPUT_BLOB_SEL = "bbox_fr";
 const char* OUTPUT_BLOB_IDX = "bbox_id";
-const char* OUTPUT_BLOB_RES = "softmax_fr";
+const char* OUTPUT_BLOB_RES = "prob_fr";
 const char* OUTPUT_BLOB_LAB = "label";
 #define DEFAULT_CAMERA -1        // -1 for onboard camera, or change to index of /dev/video V4L2 camera (>=0)

Replace the [Face-Recognition]/date/deploy.txt, [Face-Recognition]/date/merge.caffemodel, [Face-Recognition]/date/labels.txt to the corresponding files.

Build and execute!
We also upload our results for Dog-Recognition here.

8. Known issue
A.) Some v4l2 camera is not working → we are working on this.

B.) There is an algorithm to decide the label output.
We do this since we only can pass one ROI into classification model but may get multiple objects when detection.
This algorithm will automatically select an ROI and then record the classification history.

Original, we handle this with a complicated algorithm to prevent the same label in the same time.
For face recognition, it’s pretty abnormal to have two identical people shown in the same frame. But this may not be true for another use-case.

We have switched this algorithm to a simple version, the current pipeline will show what it predict directly.
It’s recommended to take a look this plugin layer and change it to what you want.

Thanks, and hope this will help : )


Hi AastaLLL,

Thank you so much.

Below are some results I tested, please correct me if I got wrong idea.
I have the following questions.

  1. Face-Recognition APP: Accuracy is low and show different name all the time.
    a. Do you get low accuracy too?
    b. Will lower frames help on accuracy?
  2. imagenet-console: our models get wrong results. (the models on DIGITS does get high accuracy)
    a. Does your face-recognition classification model have the same prediction results on DIGITS and Tx2?
    If yes, we might need to work on our model.
    Would you share more information about the way you trained your network on 9/8th?

“ We train recognition network with VGG_Face.
We train this model with DIGITs since it is a traditional classification problem.
Input: a face image, we crop face region with OpenCV API (in training stage).
Output: label ID, each ID stands for a person in database.”

b. I did train a new model with your classification.prototxt and it took really long time and got very low accuracy on DIGITS. The differences between your classification.prototxt and mine are only loss3/classifier_ and num_output.
Is there any requirement to use your classification.prototxt to train a new model?

PS: The merge model you provided and the merge model I built did have similar results as I tested.

Attach a test result table.

Thank you,

FR_test_results.docx (8.76 KB)


I got the error message below as I tested the merge model of a dog detection model with an object classification.
Even used the deploy, labels, merge.caffemodel you provided too.

nvidia@tegra-ubuntu:~/Face-Recognition-master/build/aarch64/bin$ ./face-recognition
Building and running a GPU inference engine for /home/nvidia/Face-Recognition/data/deploy.prototxt, N=1…
[gstreamer] initialized gstreamer, version
[gstreamer] gstreamer decoder pipeline string:
nvcamerasrc fpsRange=“30.0 30.0” ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
successfully initialized video device
width: 1280
height: 720
depth: 12 (bpp)

failed to open /home/nvidia/Face-Recognition/data/labels.txt
face-recognition: /home/nvidia/Face-Recognition-master/face-recognition/face-recognition.cpp:68: std::vector<std::__cxx11::basic_string > loadLabelInfo(const char*): Assertion `0’ failed.
Aborted (core dumped)

Thank you,

Hi HuiW,

  1. For accuracy, we have submitted a change to fix a bug in Recognition Plugin layer.
    Please check our GitHub for details.
    Current accuracy should be similar to the DIGITs results.

  2. For dog recognition model, we directly borrowed the model contained in Jetson_Inference.
    Jetson_inference is our another sample to demonstrate deep learning usage on TX2.
    If you don’t need to implement plugin layer, it’s recommended to use jetson_inference for more useful functions.

  3. If you want to train a face recognition model, please use the default GoogleNet model on DIGITs.
    Our model is GoogleNet without modification.

  4. For the basic_string error, please remember to remove the prefix in the label file.
    If you are finding a parser support prefix, please check here.


Hi AastaLLL,

Thanks for your great support.
The new one works a lot better.



When you say the new one works a lot better do you mean the new one with the data you trained or the default one with the celebrity faces?

For me the new version works well with the data I trained, but it doesn’t seem able to recognize the celebrities consistently within the training data. Even when I use one of the images from the training data.

It’s likely because there are a thousand different celebrities, and I’m probably expecting too much. But, I’m curious about your results.

In my testing it works fairly well with Bill Pullman, but doesn’t like Adam Driver at all. So I do believe its working.