Multi Object Detection on driveworks/PX2

Hi,
I am able to successfully train and run the single object detection on PX2 using driveworks API.
Is there any reference for extending this to Multi class object detection?

Hello karthikk,

Could you please see file:///usr/local/driveworks-0.3/doc/nvdwx_html/dwx_object_tracker_drivenet_sample.html on DrivePX2?
You can find DriveNet Sample in the doc. Thanks.

Hi Steve,

Thanks for the reply.
Yes I can find it and tried already.

But how to train it with our own model and data?
Is there any tutorials such as DetectNet example?

Seems there’s no tutorials for multi-class detection on custom datasets/architectures as of now.

To train your own model on your own data the idea is to use the DIGITS framework on a host machine (separate from the PX2) https://developer.nvidia.com/digits with your custom .prototxt Caffe architecture which will give you a .caffemodel.

For inference on the PX2 you can then

  • Modify /usr/local/driveworks-0.3/samples/src/dnn/sample_object_detector/main.cpp: Replace the "caffe_model" and "caffe_prototxt" string arguments under the main function with paths to your .caffemodel and .prototxt. (Also for some reason these arguments are subject to the #ifdef WINDOWS condition, comment these out if you're using Linux)
  • Modify /usr/local/driveworks-0.3/samples/src/dnn/dnn_common/DNNInference.cpp and in particular inferSingleFrame (to preprocess the data the way you want) and interpretOutput (to transform your network outputs into bounding boxes)

and then cross-compile the resulting sample_object_detector to the PX2 and run it there.

Hope this helps.

Hi Roberts,

Thanks for your suggestion. It helps.
I have a small question.
Have you tried to port any standard Multi Object Detection models (such as Faster RCNN or SSD) in this way?
I saw they support only limited layers:
(known types: AbsVal, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, Convolution, Crop, Deconvolution, Dropout, ELU, Eltwise, Embed, Exp, Filter, Flatten, Im2col, InnerProduct, Input, LRN, Log, MVN, PReLU, Pooling, Power, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, Silence, Slice, Softmax, Split, TanH, Threshold, Tile)

Also if you use the DetectNet architecture, are you able to train 2 classes successfully?

Thanks,
Regards,
Karthikk

Hi Karthikk,

Yes I think you are correct about layer support in Digits, and the list of layers supported by TensorRT is even smaller (I know it doesn’t support the Slice layer). Haven’t tried either Faster R-CNN or SSD, but I’m guessing the ROI pooling layer for Faster R-CNN and the Multi-box Loss layers for SSD would not be supported. No I haven’t tried DetectNet either but will post if I get round to it.

hi, Robert,
I tried to integrate caffe cifar10 pretrained model to drivenet just like you said above, but failed.
I modified sample_object_detector and DNNInterface, looks like driveworks only supports googlenet like structure, with layer name like “conv1/3x3”. But when try to use cifar10 model, with layer name like “conv1”, driveworks gives me error. when I modify cifar10 model to googlenet style - “conv1/3x3”, it parses and runs, but the output is not correct, only gives me ten 0.1.
any idea on this?

Your network should have “bboxes” and “coverage” layers for the sample_object_detector sample to infer correctly. When you say your output is ‘ten 0.1’ are you referring to the coverage layer output?
Also what data are you running it on?

Hi Karthikk,

I have trained a two class classifier using DIGITS and GoogleLeNet. Have you found a way to use the network for multiclass object detection?

Regards
Mayank