Image labelling with CUDA

joetheplumber · July 30, 2009, 1:06am

Hi all,

I’m trying to implement a connected components labelling on the GPU. So far, I’ve implemented the crux of the labelling algorithm which runs considerably faster on the GPU, compared to the CPU implementation. However, I’ve sort of run out of ideas on how to solve the problem below:

After a 4-connected labelling procedure I end up with an array like this (0 = background pixels):

0 0 0 0 0 0 0 0

0 2 2 0 0 0 0 0

0 0 0 0 6 6 0 0

0 0 0 0 0 0 0 0

But, I want to adjust the labels [2,6] to [1,2], so the output image looks like this:

0 0 0 0 0 0 0 0

0 1 1 0 0 0 0 0

0 0 0 0 2 2 0 0

0 0 0 0 0 0 0 0

I’ve had a couple of ideas involving sorting the initial labels, but with little success. I would really appreciate any ideas or suggestions. Many thanks for reading.

cheers

SPWorley · July 30, 2009, 2:35am

I outlined a rough algorithm strategy for connected components here. I’d probably attack it the same way now.

After reading that again now (that’s my post from 8 months ago) one algorithm tweak I might do would be to pad the length of the shared memory array to an odd number deliberately to break up the bank conflicts, making both horizontal and vertical scanning bank-conflict free.

SPWorley · July 30, 2009, 2:39am

as for the relabelling, the strategy depends on how large your labels get. If you have less than a few thousand, you could make an existence table in shared memory (init to all 0, when you see an ID, set that index value to 1.) Then do a compaction on it, giving you an efficient remapping.

If you have more than a few thousand labels, it’s harder since you can’t hold them all in a block at once, but still possible. You might just do a sort on the labels, even with redundancies, then look for the transitions to clump the repeats together, then do a compaction on that final list.

pcchen · July 30, 2009, 7:50pm

A way to work around the shared memory limit problem is let different blocks to load different parts of the labels, and let all blocks reading from the same memory location. I’m not sure whether the GPU is smart enough to avoid reading from the same location multiple times, but anyway if you have enough number of physical blocks (for example, a GTX 280 has 30 blocks) you can handle a much larger number of labels.

joetheplumber · July 30, 2009, 10:35pm

Hi guys,

Many thanks for the replies with great ideas. SPWorley, thanks for the link to your implementation.

I’ve done the relabelling using atomic functions. I have an array (as big as the image) and I only flag the pixels:

array[LABELn - 1] = 1

So, the array looks like this:

0 1 0 0 0 1 0 0

0 0 0 0 0 0 0 0

Then, I apply the following kernel on this array to have incremental labels:

__global__ void label_increment_kernel(unsigned int* source, int *count, const uint2 image_size)

{

	int x = blockIdx.x * blockDim.x + threadIdx.x;

	int y = blockIdx.y * blockDim.y + threadIdx.y;

	unsigned int index = y * (image_size.x) + x;

	int src_val = source[index];

	

	if(src_val > 0)

	{

		int val = atomicAdd(count, 1);

		source[src_val-1] = val;

	}

}

I end up with something like this:

0 1 0 0 0 2 0 0

0 0 0 0 0 0 0 0

Which I then use to remap the non-sequential labels. It runs perfectly fine.

However, the problem is, there is no guarantee that the labels I generate using the atomic function will be in raster order, which is a requirement for the reproducibility of results. The results I get are different for every run, as the first label is assigned to the thread that gets scheduled ahead of others.

I might have to go down the sorting path I guess. Suggestions…

cheers

Topic		Replies	Views
Connected component labelling on CUDA CUDA Programming and Performance	4	30364	July 28, 2011
Optimizing Image Labeling Connected Component Labeling CUDA Programming and Performance	8	12762	December 9, 2009
Region Labeling in Cuda Optimising for Cuda, memory etc CUDA Programming and Performance	4	6313	April 18, 2011
Labeling (connected component labeling) in cuda CUDA Programming and Performance	3	4735	February 10, 2016
Image Processing (Labeling and Blob Analysis) Calculating connected component labeling and blob area CUDA Programming and Performance	1	3116	November 19, 2008
Image Processing (Labeling and Blob Analysis) calculating connected component labeling and blob area CUDA Programming and Performance	1	1934	October 12, 2021
Application working correctly for a small image but not for larger ones CUDA Programming and Performance	0	695	March 8, 2012
I am trying to use cuda to apply connected component labelling algorithm using recursive function (dfs) and i got stuck and i need some help CUDA Programming and Performance	2	729	December 30, 2021
Connected Components Labeling (CCL) in CUDA 9.1 CUDA Programming and Performance	2	1857	June 11, 2019
Blob detection with cuda-c CUDA Programming and Performance	7	4383	April 23, 2019

Image labelling with CUDA

Related topics