Bug in caffe detectnet/clustering.py?

In trying to replicate the DIGITS clustering algorithm in c++, (I think) I have discovered a bug clustering.py.

The clustering layer for detectnet has a method vote_boxes() that takes bounding boxes and groups them using opencv’s groupRectangles() method. If I read the source correctly, however, this call is being made improperly. It looks like the bounding boxes are stored as lists of [x1, y1, x2, y2] in caffe, but groupRectangles() expects them to be passed in as lists of [x, y, width, height].

Am I reading this correctly?

(please note I have verified the python -> c++ translation of the python bindings by building opencv from source)

caffe source: https://github.com/NVIDIA/caffe/blob/caffe-0.17/python/caffe/layers/detectnet/clustering.py#L178
cv::Rect docs: https://docs.opencv.org/3.1.0/d2/d44/classcv_1_1Rect__.html
bug report: https://github.com/NVIDIA/caffe/issues/557

Maybe this will help jump-start the conversation:

What format is the data in the bounding boxes? Can someone confirm it is (x1, y1, x2, y2)? If not here, is there a better forum where I can ask the question?


To close the loop on this: there is definitely a bug in NVIDIA/caffe. The groupRectangles() interface is being called incorrectly. Use this module for training your network at your own discretion.

As far as I can tell, it mostly works because groupRectangles() will do ok correlating rectangles using (x1,y1,x2,y2) instead of (x1,y1,width,height). But I’ve anecdotally noticed the grouping is less effective for detections on the upper-left and lower-right of images.