I have successfully trained DetectNet with 3 classes in DIGITS with nvcaffe-0.15. I use a picture size of 1440x912 and a stride of 16, therfore my grid is of size 90x57. When I perform inference in DIGITS with my own data (png files extracted with ffmpeg from an .h264 file), I get detections for all three classes. Therefore I think the DetectNet prototxt file was modified in the right way and also the training seems to have worked out.
Then, I use the tensorRT_optimization tool to create a TensorRT optimized .bin file. Everthing seems to work fine up to here.
Now, I am trying to modify sample_object_dwdetector to be able to detect multiple classes. I used the DriveNet sample as an example and am at the point where I would think it should work.
Unfortunately, I only get detections for class0 (the first class). When I used dwDNN_getOutputSize to check whether the output blobs of my .bin file have the right shape, I noticed that the blobSize of the “bboxes” blob in Driveworks is 1x90x57x4 [batch_size x grid_sz_x x grid_sz_y x 4 (xl, yt, xr, yb)]. This is just as it is written in the DetectNet prototxt file and just how the clustering layer (clustering.py) in nvcaffe (used by DIGITS for clustering) expects it.
However, the Driveworks documentation for dwObjectDetector_initialize() states that “The number of channels in the bounding box blob is equal to 4 times number of classes that the network detects.” So as I understand it, dwObjectDetector_initialize() expects 1x90x57x4x3 as the blobSize for my network.
How do I make this fit together? Why does Driveworks expect 1x90x57x4x3 for the “bboxes” blobSize while DIGITS only gives 1x90x57x4. Why does dwObjectDetector_initialize() in Driveworks want a different number of channels for the “bboxes” blob than DIGITS provides? Is this the reason why I only get detections for class0?
Could somebody from the Nvidia team help?