Merging object detection and object classification

Hi all,

I am trying to merge object detection and classification together into one instead of calling the inference separately. After looking from this link https://devtalk.nvidia.com/default/topic/1007313/jetson-tx2/how-to-build-the-objection-detection-framework-ssd-with-tensorrt-on-tx2-/2

I have some question regarding to it:

To merge DetectNet and GoogleNet. For step 3, where do I remove the data declaration from classification network. It is after I merged the prototxt or before I merged it, and where do I remove the data declaration.

Thanks in advance!

DetectNet already classifies the objects it detects. Why do you want to add GoogleNet?

A very rough approximation of how detectnet work, is to say that it runs GoogleNet-like object detection as a convolutional step, on a coarse grid, and simultaneously also outputs the predicted corners of the objects it classifies.

If you want to build a hierarchical model, such that detectnet detects “car” and googlenet detects “1967 Camaro,” then the better way to do this would be to look at the bounding boxes that come out of detectnet, and extract those boxes as small object images from the input image, and run each of those small object images into a googlenet trained on smaller images. You don’t do this by merging the prototxt files; you do this by writing code that loads both models, and knows how to extract/scale the data from the bounding boxes, and forward to the next network (presumably in some pipelined loop.)

Hi,

Check more information here:
https://devtalk.nvidia.com/default/topic/1023699/jetson-tx2/questions-about-face-recongnition/post/5209485/#5209485

Thanks.

Hi AastaLLL,

I followed the link that you provide, however when I tried to build a new classification model I keep having error code -11. Here are my setting for the classification model:

Training epochs: 1
Snapshot interval: 1
Validation interval: 1

Solver type: SGD
Base Learning Rate: 0.01
Policy: Step Down
Step Size 33
Gamma: 0.1

Subtract Mean: Image

For custom network I used the one I generate in step 4.

For the error code -11 below is the following message that I got:

Test net output #19998: prob_fr = 0.000746771
Test net output #19999: prob_fr = 0.0013324
Test net output #20000: prob_fr = 0.000699792
Test net output #20001: prob_fr = 0.000730556
Test net output #20002: prob_fr = 0.000465725
Test net output #20003: prob_fr = 0.00140473
Test net output #20004: prob_fr = 0.000972217
Test net output #20005: prob_fr = 0.0011883
Test net output #20006: prob_fr = 0.00168505
Test net output #20007: prob_fr = 0.00119044
Test net output #20008: prob_fr = 0.000624847
Test net output #20009: prob_fr = 0.000463807
Test net output #20010: prob_fr = 0.0017488
Test net output #20011: prob_fr = 0.00053642
Test net output #20012: prob_fr = 0.000488077
Test net output #20013: prob_fr = 0.00112253
Test net output #20014: prob_fr = 0.000426881
Test net output #20015: prob_fr = 0.00155486
Optimization Done.
Optimization Done.

Thanks!

Hi AastaLLL,

I manage to solve the error issue and was able to run the program. It can detect the object and display the object (I used the dog example) well, but when i switch to my own (to detect watch). Once it detect it, it crash straight away.

Is there any way to debug this?

Below is the error:

HERE HERE HERE: 0x3891280
ROI: 0 0 0 0
0 bounding boxes detected

HERE HERE HERE: 0x3891280
ROI: 0 0 0 0
0 bounding boxes detected

HERE HERE HERE: 0x3891280
pass 0 to trt
150.984 102.734 246.594 204.188 
ROI: 151 103 96 101
ID=0, label=833
1 bounding boxes detected
bounding box 0   (402.625000, 154.101562)  (657.583374, 306.281250)  w=254.958374  h=152.179688

HERE HERE HERE: 0x3891280
Segmentation fault (core dumped)

Thanks!

Hi,

Network size is hardcoded due to no parameter support in Plugin API.
You may need to modify the size here for your custom model:
https://github.com/AastaNV/Face-Recognition/blob/master/pluginImplement.cpp#L253
https://github.com/AastaNV/Face-Recognition/blob/master/pluginImplement.cpp#L288

Thanks.

Hi,

Can I ask why for the classification training use 224x224 but we need to change it back to 640x640 in the end?

The size of my custom model followes the same parameter as the dog example, it still crash when it detect watch. I train the classification model as 224x224, but for the step4.prototxt once I change it to 3, 480,480 as my detection model was train using 480x480. If I modify the size to 3,480,480 it shows this error message:

nvidia@tegra-ubuntu:~/Face-Recognition-master/build/aarch64/bin$ ./face-recognition 
Building and running a GPU inference engine for /home/nvidia/Desktop/Merge_example/Tank/step4.prototxt, N=1...
[gstreamer] initialized gstreamer, version 1.8.3.0
[gstreamer] gstreamer decoder pipeline string:
nvcamerasrc fpsRange="30.0 30.0" ! video/x-raw(memory:NVMM), width=(int)768, height=(int)576, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
successfully initialized video device
    width:  768
   height:  576
    depth:  12 (bpp)

loss3/classifier_fr: kernel weights has count 1024000 but 82944000 was expected
face-recognition: /home/nvidia/Face-Recognition-master/tensorNet.cpp:34: void TensorNet::caffeToTRTModel(const string&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, unsigned int): Assertion `engine' failed.
Aborted (core dumped)

Thanks!

Hi,

The input size of DetectNet is 640x360 and the input size of googleNet is 224x224.
That’s why there is an ROI resize layer to scale an ROI region into 224x224 via CUDA.

From your log, there is something wrong when merging the detection and classification model.
Please recheck your procedure with the information mentioned in following topics:
https://devtalk.nvidia.com/default/topic/1023699/jetson-tx2/questions-about-face-recongnition/
https://devtalk.nvidia.com/default/topic/1007313/jetson-tx2/how-to-build-the-objection-detection-framework-ssd-with-tensorrt-on-tx2-/

Thanks

Hi AastaLLL,

Thanks for the input, I did follow and recompile again but it still have the same error whenever I change the my size (480x480) in pluginImplement.cpp. For the training of new classification, is it suppose to be like this? (I attach the photo below)

Can I check with you on one last thing? Does face-recognition support external camera e.g IP or v4l2 camera? The on board camera seem to be working fine but when I switch over to IP/v4l2 camera it keep saying this

failed to capture frame
failed to convert from NV12 to RGBA
[cuda]   cudaPreImageNetMean((float4*)imgRGBA, camera->GetWidth(), camera->GetHeight(), data, dimsData.w(), dimsData.h(), make_float3(127.0f, 127.0f, 127.0f))
[cuda]      invalid device pointer (error 17) (hex 0x11)
[cuda]      /home/nvidia/Face-Recognition/face-recognition/face-recognition.cpp:223
cudaPreImageNetMean failed

OR

[cuda]   registered 7077888 byte openGL texture for interop access (768x576)
Segmentation fault (core dumped)

I tried testing it on detectnet/imagenet camera and both seem to be working fine with IP/v4l2 camera.

Thanks!

Hi,

Face recognition sample only supports the onboard camera.
If you are finding a sample for the V4L2 camera, please check our jeston_inference sample here:
https://github.com/dusty-nv/jetson-inference

Thanks.