How to build the objection detection framework SSD with tensorRT on tx2?

Holy_chen · May 5, 2017, 4:02am

Currently,I have been build the objection detection framework SSD with GitHub - weiliu89/caffe at ssd on TX2,but the speed is only about 4 frames per second.Hence,I want to speed it using tensorRT on TX2.

AastaLLL · May 5, 2017, 5:25am

Hi,
Thanks for your question.

Please first check if layers used in your model is supported by tensorRT.

If yes, TensorRT can support caffe format.
Here is the sample code: GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

Holy_chen · August 1, 2017, 1:45am

Hi AastaLLL,
I have installed Jetpack3.1 on my Jetson_tx2.Is it possible to implement these layers in the SSD framework,such as PriorBox,Normalize,Concat,Permute,Flatten.If it can, can you give a specific example?

AastaLLL · August 1, 2017, 2:33am

Hi,

Please check sampleFasterRCNN, samplePlugin for details.

Located at /usr/src/tensorrt/

Holy_chen · August 4, 2017, 7:07am

Hi, AastaLLL
Error occurred，when I tested my prototxt file which included Deconvolution layer with TensorRT2.1.The error code in my Terminal:

Begin parsing model...
Caffe Parser: groups are not supported for deconvolutions
error parsing layer type Deconvolution index 139
End parsing model...
Segmentation fault (core dumped)

When I check the TensorRT2-1-User-Guide.pdf,I kown that The tensorRT2.1 support the implementation of this layer.

Deconvolution
The Deconvolution layer implements a deconvolution, with or without bias.

How to slove the problem?Could you give me some suggestion?

AastaLLL · August 7, 2017, 1:54am

Hi,

Could you share your deconvolution definition?
TensorRT doesn’t support:

deconv kernel size != stride
group property

Thanks.

Holy_chen · August 7, 2017, 8:10am

Hi,
This is my deconvolution definition.

layer {
  name: "upsample"
  type: "Deconvolution"
  bottom: "inc4e"
  top: "upsample"
  param { lr_mult: 0  decay_mult: 0 }
  convolution_param {
    num_output: 256
    kernel_size: 4  stride: 2  pad: 1
    group: 256
    weight_filler: { type: "bilinear" }
    bias_term: false
  }
}

It includes group property.How to slove the problem?

AastaLLL · August 8, 2017, 2:06am

Hi,

Deconvolution layer of TensorRT doesn’t support:

kernel size != stride
group property

These two features are in our next release plan.
Currently, we have custom API to allow a user to implement non-supported layer by their own.

Thanks and sorry for the inconvenience.

AastaLLL · August 25, 2017, 5:38am

Hi,

We have written a face-recognition sample to demonstrate TensorRT2.1 Plugin API.
Please check this GitHub for more details:

Holy_chen · August 25, 2017, 7:54am

Hi,
I’m very happy you let me know this new example at the first time! I test the demo on jetson-tx2.
Some debug appeared.

./face-recognition 
Building and running a GPU inference engine for /home/nvidia/Face-Recognition/data/deploy.prototxt, N=1...
[gstreamer] initialized gstreamer, version 1.8.3.0
[gstreamer] gstreamer decoder pipeline string:
nvcamerasrc fpsRange="30.0 30.0" ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
successfully initialized video device
    width:  1280
   height:  720
    depth:  12 (bpp)

Bindings after deserializing:
Binding 0 (data): Input.
Binding 1 (coverage_fd): Output.
Binding 2 (bboxes_fd): Output.
Binding 3 (count_fd): Output.
Binding 4 (bbox_fr): Output.
Binding 5 (bbox_id): Output.
Binding 6 (softmax_fr): Output.
Binding 7 (label): Output.
loaded image  /home/nvidia/Face-Recognition/data/fontmapA.png  (256 x 512)  2097152 bytes
[cuda]  cudaAllocMapped 2097152 bytes, CPU 0x102a00000 GPU 0x102a00000
[cuda]  cudaAllocMapped 8192 bytes, CPU 0x102c00000 GPU 0x102c00000
default X screen 0:   1920 x 1080
[OpenGL]  glDisplay display window initialized
[OpenGL]   creating 1280x720 texture
[gstreamer] gstreamer transitioning pipeline to GST_STATE_PLAYING

Available Sensor modes : 
2592 x 1944 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
2592 x 1458 FR=30.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
1280 x 720 FR=120.000000 CF=0x1109208a10 SensorModeType=4 CSIPixelBitDepth=10 DynPixelBitDepth=10
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> nvcamerasrc0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvcamerasrc0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer msg new-clock ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvcamerasrc0

NvCameraSrc: Trying To Set Default Camera Resolution. Selected sensorModeIndex = 1 WxH = 2592x1458 FrameRate = 30.000000 ...

[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer msg stream-start ==> pipeline0
Allocate memory: input blob
Allocate memory: coverage
Allocate memory: box
Allocate memory: count
Allocate memory: selected bbox
Allocate memory: selected index
Allocate memory: softmax
Allocate memory: label
[gstreamer] gstreamer decoder onPreroll
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x103200000 GPU 0x103200000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x103400000 GPU 0x103400000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x103600000 GPU 0x103600000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x103800000 GPU 0x103800000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x103a00000 GPU 0x103a00000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x103c00000 GPU 0x103c00000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x103e00000 GPU 0x103e00000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104000000 GPU 0x104000000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104200000 GPU 0x104200000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104400000 GPU 0x104400000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104600000 GPU 0x104600000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104800000 GPU 0x104800000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104a00000 GPU 0x104a00000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104c00000 GPU 0x104c00000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x104e00000 GPU 0x104e00000
[cuda]  cudaAllocMapped 1382400 bytes, CPU 0x105000000 GPU 0x105000000
[cuda]   gstreamer camera -- allocated 16 ringbuffers, 1382400 bytes each
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer msg async-done ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
[cuda]   gstreamer camera -- allocated 16 RGBA ringbuffers
ROI: 0 0 0 0
0 bounding boxes detected
[cuda]   registered 14745600 byte openGL texture for interop access (1280x720)
ROI: 0 0 0 0
0 bounding boxes detected
ROI: 0 0 0 0
0 bounding boxes detected
pass 0 to trt
ROI: 259 248 96 139
Cuda failure: unspecified launch failure at line 328
Aborted (core dumped)

Bug(Aborted (core dumped)) occurs whenever a human face is detected.How to solve it? Could you give me some suggestion? Thank you very much in advance!

AastaLLL · August 28, 2017, 2:43am

Hi,

Thanks for your feedback.

There is a bug in handling image boundary.
We fix the bug already. Please recheck it.
https://github.com/AastaNV/Face-Recognition/commit/7bdab40c4b54ce3b6410ddb32a8c198768824789

Thanks for your feedback, and sorry for the inconvenience.

S4WRXTTCS · August 28, 2017, 8:18pm

I know this face-recognition example is meant to demonstrate the plug-in API functionality, but I’m curious about the model and dataset used to train it.

Is the model and the dataset publicly available? It works great at detecting my face at around 15fps, but of course my face isn’t in the data so it misidentifies me as a bunch of different celebrities. So I’d love to be able to retrain it with additional data to see how accurate it would be.

It also seems to be some kind of merged model which I’m not familiar with.

AastaLLL · August 29, 2017, 1:57am

Hi,

We generate this model by combining DetectNet and GoogleNet; both can be found in DIGITs.

Share our steps:

Train DetectNet(detection) with FDDB database
Train GoogleNet(classification) with VGG_Face database
Run this script to generate a merged prototxt
Randomly generate a caffemodel of new prototxt (DIGITs is a useful tool)
Replace weights of No.4. with weight in No.1 and No.2 via this script
Add the plugin layer to prototxt.(ex. bboxMerge)

For your use case, you can train your classification network with GoogleNet.
Then use this script to overwrite the FR weights.

S4WRXTTCS · August 29, 2017, 3:19am

Thanks.

I didn’t even realize merging DetectNet and GoogleNet was even possible. A few months ago I wrote a DualNet program that combined DetectNet and AlexNet, but I pipelined them. Where I trained each one separately, and did the Inference separately.

I used DetectNet to detect Playing Cards, and then I sent the Region of Interest to an AlexNet model that determined which card it was.

Doing it this way will allow me to combine these two networks into one if I’m understanding it correctly.

S4WRXTTCS · August 30, 2017, 12:10am

For steps 1, and 2 I trained the networks using the datasets I have for the Playing cards.

The script seemed to work fine where it renamed the detectnet layers to having _fd on the end, and _fr on the end of the GoogleNet Classification layers.

But, I ran into a Digits error on Step 4.

Using the merged prototxt it gave an error of “Layer ‘deploy_transform_fd’ references bottom ‘data_fd’ at the TRAIN stage however this blob is not included at that stage.”

The input layer is still named “data” as it didn’t get changed. But, there are two input layers name data. The one for the DetectNet (mine is 1280x944x3), and GoogleNet (224x224).

Is there an example prototxt for step 4?

The prototxt that’s included seems to be for step 6 where it has a dataRoi layer as the input to the GoogleNet classification.

AastaLLL · August 30, 2017, 7:25am

Hi,

Step.4 is want to generate a random weight for the merged .prototxt. We use DIGITs just for convenience.

After merging the model, we change the data input size to the classification input size, for example, 1x3x224x224.
Then rename the layer data_fd and data_fr into data. (Sorry for missing to mention this)

Our script is here:
https://drive.google.com/open?id=0B-fFMM_3Dj9JVTJQWFRZdlRjWjA

S4WRXTTCS · September 5, 2017, 7:14pm

Thanks for the prototxt for step 4.

Do you have the prototxt for step 2 (GoogleNet Classification) because what I have is a lot different than what’s in face recognition part of the file you sent for step 4.

I’m also a little unsure of how to create this caffemodel within Digits. I used Model type of other, and then used the default values but it runs out of memory. So I changed it to an image classification model with a custom network, and then pasted the prototxt in. It’s still in the process of training/generating the model. I’ll test it out once/if it makes it to the snapshot epoch.

For now I’m using the one that you sent across in the last message until I can figure out everything that needs modification on mine.

S4WRXTTCS · September 6, 2017, 12:31am

@AastaLLL

I was able to generate a caffe model by selecting classification model, and then doing a custom network. Where I copied and pasted the prototxt for step 4 that you linked to. At epoch one I saved the model after it created the snapshot.

But, now I’m having issues with #5.

It’s giving me the following error message when I run the merge model script.

“0905 17:09:35.905570 2614 gpu_memory.cpp:74] Check failed: initialized_ Create GPUMemory::Scope to initialize Memory Manager
*** Check failure stack trace: ***
Aborted (core dumped)”

I might not be using the correct deploy.prototxt file that the merge model script expects. I’m using the one generated with the rename_model.py script without any further edits.

Another thing that could be causing issue is in my detection.prototxt I have a deploy_transform layer that is not in the detection part of the deploy_prototxt that you linked to

S4WRXTTCS · September 6, 2017, 8:16pm

Can I get the following files so I can recreate this face recognition model?

classification.prototxt
detection.prototxt
the deploy.prototxt used for step 5.

On the 6 steps
For #1 I’m assuming the standard detectnet is used along with a pretrained model.
For #2 I’m assuming its the standard GoogleNet model without a pretrained model.
For #3 The merged prototxt I don’t understand as there are more changes to it than I expect, but without seeing the source files I can’t say for certain.
For #4 Using a classification network with Digits seems to work fine where I copy and paste it as a custom network.
For #5 I get the initialize memory error. It’s most likely because one of the input files isn’t what it expects. I did try it on two digits machines, and they both reported the same message.

AastaLLL · September 7, 2017, 6:46am

Hi,

Sorry for keeping you waiting. Here are our implementation details:

1. Train DetectNet(detection) with FDDB database
We use standard DetectNet. Model can found here

2. Train GoogleNet(classification) with VGG_Face database
We use standard GoogleNet. Model can be found here

3. Run this script to generate a merged prototxt
Here is some manual modification:
We want detection and classification use the same input data blob since no plugin layer yet in this step
We also need to use classification input size since there is a fixed size fully-connected layer in the end.

Change the input_shape of data to be 1x3x224x224, which is our classification network input size
Remove the data declaration in the middle (this is from classification network)
Change input of detection model from data_fd to be data
Change input of classification model from data_fr to be data

4. Randomly generate a caffemodel of new prototxt (DIGITs is a useful tool)
Share more details about this:

Generate a database with size 1x3x224x224
Start training a classification model with following setting
Epochs=1
Learning Rate=0
Custom Network = deploy.prototxt in step3
Download model

5. Replace weights of No.4. with weight in No.1 and No.2 via this script
deploy_fd = model in step1
deploy_fr = model in step2
deploy_merge = model in step3
model_fd = model in step1
model_fr = model in step2
model_merge = model in step4

6. Add the plugin layer to prototxt.(ex. bboxMerge)
We totally add three plugin layers: bboxMerge, selectBbox(Recognition), summaryLabel(Recognition), dataRoi
Now, we have dataRoi plugin; we change the data input to be:

Change data input size back to 1x3x360x640, which is detection input size.
Replace the input blob of classification layer from data to data_roi (in conv1/7x7_s2_fr layer)
Note: the added layer is the weight-free layer, so we can use the same caffemodel in step 5 although new layers are added.

Feel free to let us know if you need help.
Thanks.