Parallelizing blob detection

I am using CUDA to do basic industrial inspection of thin film materials on gray scale images gathered with a line scan camera.

I have a CPU based blob implementation for finding connected regions, but it is dependant on going down the rows sequentially.

Does anybody have advice or references regarding parallelizing blob detection?


I think that there are some parallel connected components analysis algorithms out there, but I haven’t investigated their applicability to CUDA. There also may be other, better approaches to finding blobs than connected components, but I don’t know about them.

If you find anything interesting about this, or want to collaborate on this, email me at geoff.langdale AT and we’ll talk further. I’ve been meaning to try this for a while.

Have any of you compute vision guys found a good way to do this with CUDA? This is a fairly canonical problem in computer vision, but most of the parallel algorithms that I have found are tightly to a specific architecture.

A (wild) idea would be that you could probably find connected components recursively by doing a reduction, if possible this would be very parallelizable.

It’s been a while since I started researching conected components labeling algorithms.
As far as I know, the best (sequential) implementation for CPU is the one shown in

I’ve been implementing some parallel labeling approaches in CUDA, but surprisingly the sequential implementation is still better than my one.

Basically, I’ve divided the operation in two different phases: local labeling and label merge. My algorithm is able to perform the local labeling phase, for a 1024x1024 input image, in just 1.3 ms. The problem is with the merge phase, which is done completely in CPU. :(


I have spent quite a lot of time thinking in a solution for doing connected components on GPU but I can’t come out with a correct (and faster than CPU) idea…

Has anybody advanced on this??



If the algorithm is dependent on going down the rows sequentially, I would try this approach:

Break up the image into columns. This way, you can perform an algorithm which goes down the rows sequentially, except that now, each row is much shorter, and you can process different columns in parallel to each other. It may be neccessary to have some overlaps in the columns, depending on the exact algorithm.