Our CV pipeline, involves processing of blobs in binary images. We Generate those binary images on GPU using cuda kernels. Then we process the binary images containing blobs on CPU (on TX1). The OpenCV API like findContour() takes long time (about 9ms on Tx1, at 1080p single binary plane resolution) - probably in reading image data in then generating blob info (e.g. centroids and other shape parameters).
Has someone implemented contour generation and centroid determination on GPU ? Does any example include anything that can be used for blob processing ?
Would appreciate any pointer on this - before we embark on our own implementation.