Multi-class inference with DetectNet in Driveworks shows only detections for class0

Hello all,

I have successfully trained DetectNet with 3 classes in DIGITS with nvcaffe-0.15. I use a picture size of 1440x912 and a stride of 16, therfore my grid is of size 90x57. When I perform inference in DIGITS with my own data (png files extracted with ffmpeg from an .h264 file), I get detections for all three classes. Therefore I think the DetectNet prototxt file was modified in the right way and also the training seems to have worked out.

Then, I use the tensorRT_optimization tool to create a TensorRT optimized .bin file. Everthing seems to work fine up to here.

Now, I am trying to modify sample_object_dwdetector to be able to detect multiple classes. I used the DriveNet sample as an example and am at the point where I would think it should work.

Unfortunately, I only get detections for class0 (the first class). When I used dwDNN_getOutputSize to check whether the output blobs of my .bin file have the right shape, I noticed that the blobSize of the “bboxes” blob in Driveworks is 1x90x57x4 [batch_size x grid_sz_x x grid_sz_y x 4 (xl, yt, xr, yb)]. This is just as it is written in the DetectNet prototxt file and just how the clustering layer ( in nvcaffe (used by DIGITS for clustering) expects it.

However, the Driveworks documentation for dwObjectDetector_initialize() states that “The number of channels in the bounding box blob is equal to 4 times number of classes that the network detects.” So as I understand it, dwObjectDetector_initialize() expects 1x90x57x4x3 as the blobSize for my network.

How do I make this fit together? Why does Driveworks expect 1x90x57x4x3 for the “bboxes” blobSize while DIGITS only gives 1x90x57x4. Why does dwObjectDetector_initialize() in Driveworks want a different number of channels for the “bboxes” blob than DIGITS provides? Is this the reason why I only get detections for class0?

Could somebody from the Nvidia team help?

Best regards,