Need help in implementing Hough Transform for line detection Need help in figuring out what's wr

upit · March 25, 2012, 10:04pm

Hi Guys,
I am new to CUDA. I am trying to implement Hough Transform for line detection in CUDA. I use this piece of code in Mathematica 8. I do get an output but it’s pretty weird, random bright spots instead of waves(curves) that is expected I guess. I know there are a couple of implementations available in this forum but those appear too complicated to me given that the algorithm is pretty much simple. Right now I am not looking to implement a optimized version. I just wan’t it to be as simple as possible and work i.e give correct results. Here is my kernel. Note I haven’t added code for threshold checking.

HoughTransform =
"global void hough_line(unsigned char * in, unsigned char * out, int outint, mint outsize, mint threshold, mint width, mint height,float arraySines, float arrayCosines )
{
mint xIndex = threadIdx.x + blockIdx.xblockDim.x;
mint yIndex = threadIdx.y + blockIdx.yblockDim.y;
mint index = xIndex + yIndexwidth;
mint i;
float rho = 0;

     	if (xIndex < width && yIndex < height)
		{
			
			   for ( i = 0 ;i < 36;i++)
			  {
				 rho = xIndex * arrayCosines[i] + 
						yIndex *    arraySines[i];
				  index = (i * 5.0) + rho * 180; 
				  if(index < outsize)	
				  atomicAdd(&outint[index],1);
			  }
			  				
		}		  		  
     __syncthreads();
     for( i = 0; i < outsize; i++)  // Had to do this as atomicAdd doesn't support unsigned char*
    {
   out[i] = outint[i];
    }	
  }";

Sorry if my implementation seems pretty naive and if I have made nay obvious mistakes. I am trying to learn. Thanks.

tera · March 25, 2012, 10:21pm

__syncthreads() syncs within one block only. If you need out as char, launch a second kernel for the conversion, don’t do it within the same kernel.

upit · March 26, 2012, 4:26am

Thanks tera,

I removed __syncthreads() and also copying the data in the same kernel. It was obviously very stupid to run a loop in the very same kernel. :). But I still don’t get the desired output. Do you or anyone see anything wrong with my Hough implementation ?

Thanks.

Gert-Jan · March 26, 2012, 11:39am

You now have:

rho = xIndex * arrayCosines[i] + yIndex * arraySines[i];

index = (i * 5.0) + rho * 180;

The variable ‘rho’ can become negative, since cos(x) with 90<x<180 degrees is negative. This can create problems later with calculating ‘index’. Actually ‘rho’ can vary between ‘-max(xIndex)’ and ‘sqrt(max(xIndex)^2 + max(yIndex)^2)’. So you can update you code to:

rho = xIndex * arrayCosines[i] + yIndex * arraySines[i] + max(xIndex);

index = (i * 5.0) + rho * 180;

Now ‘rho’ is bounded by ‘0’ and ‘sqrt(max(xIndex)^2 + max(yIndex)^2) + max(xIndex)’. You probably have to change the size of your allocated output arrays accordingly.

upit · March 26, 2012, 7:23pm

You now have:
rho = xIndex * arrayCosines[i] + yIndex * arraySines[i];

index = (i * 5.0) + rho * 180;
The variable ‘rho’ can become negative, since cos(x) with 90<x<180 degrees is negative. This can create problems later with calculating ‘index’. Actually ‘rho’ can vary between ‘-max(xIndex)’ and ‘sqrt(max(xIndex)^2 + max(yIndex)^2)’. So you can update you code to:
rho = xIndex * arrayCosines[i] + yIndex * arraySines[i] + max(xIndex);

index = (i * 5.0) + rho * 180;
Now ‘rho’ is bounded by ‘0’ and ‘sqrt(max(xIndex)^2 + max(yIndex)^2) + max(xIndex)’. You probably have to change the size of your allocated output arrays accordingly.

Thanks for the comment Gert-Jan. I didn’t think about it when I wrote the code. But apparently my theta range is -90 - +90. Hence the width 180. It stil doesn’t seem to work. I now have only one point in my image and I am expecting a sinosoid in the parametric Hough space. But I still get random points :(

upit · March 26, 2012, 9:21pm

Stil no luck in getting it right :( . It appears that I didn’t completely understand grid/block dimensions. After referring to this post The Official NVIDIA Forums | NVIDIA I sort of corrected my implementation realizing that I am accessing my Image as 1D array.

Here is what I am doing now,

arrayCosValues = Table[Cos[i Degree], {i, -90, 90, 10}]

arraySinValues = Table[Cos[i Degree], {i, -90, 90, 10}]

HoughTransform = 

  "__global__ void hough_line(unsigned char * in, int *outint, mint outsize, mint threshold, mint width, mint height,float *arraySines, float *arrayCosines )

 {

         	

  			mint index = blockIdx.x + threadIdx.x;

  			mint outindex = 0;

                        mint i;

  			float rho = 0;

  			

         	if (blockIdx.x < height && threadIdx.x < width)

  			{

  				if(in[index] > 30)

  				{

  				    for ( i = 0 ;i < 18;i++)

  				   {

  					  rho = threadIdx.x * arrayCosines[i] + 

  							blockIdx.x *  arraySines[i];

  					   outindex = i * 10 + rho * 180; 

  						

  					   if(outindex < outsize)	

  					   //atomicAdd(&outint[index],60);

  						outint[outindex]=60;

  				   }

  				  }

  								

  			}		  		  

	

      }";

/*This is code compilation in mathematica. (512,1,1) is block dimension*/

HoughTL = CUDAFunctionLoad[HoughTransform,  "hough_line", {{"UnsignedByte", "Input"}, {"Byte[4]",  "Output"}, _Integer, _Integer, _Integer, _Integer, {"Float", _, 

    "Input"}, {"Float", _, "Input"}}, {512, 1, 1},   ShellOutputFunction -> Print]

/*Allocation of CUDA global dev memory*/

outputHoughCUDAMem = 

  CUDAMemoryAllocate[

   "UnsignedByte", {180, Floor[Sqrt[height^2 + width^2]] + 180}];

outputHoughCUDAMemInt = 

  CUDAMemoryAllocate[

   Integer, {180, Floor[Sqrt[height^2 + width^2]] + 180}];

HoughTL[inputEdgeCUDAMem, outputHoughCUDAMemInt, 

 180*(Floor[Sqrt[height^2 + width^2]] + 

    180), 10, width, height, arraySineValues, arrayCosValues, 262144] /*Last param is number of threads(512*512) I wan't to run it on as I dont know how to specify grid dimensions*/

/*Copying from int to unsigned char*/

CpyIntToCh = 

  "__global__ void copyIntToChar(unsigned char * out, int \

*outint,mint width,mint height)

  {

  	

  	mint index = blockIdx.x + threadIdx.x;

  	if (index < width * height)

  	{

  		out[index]=0;

  		if(outint[index] > 50)

  		out[index] = outint[index];

  	

  	}

  }";Â

CPYINTTOCHAR = 

 CUDAFunctionLoad[CpyIntToCh, 

  "copyIntToChar", {{"UnsignedByte", "Output"}, {"Byte[4]", 

    "Input"}, _Integer, _Integer}, {512, 1, 1}, 

  ShellOutputFunction -> Print]

When I run this on a 256 * 256 Image having just 1 point I get nothing in the parameter space. As I said I am expecting a sinosoid. I am wondering what is wrong with the code. Also I am not aware of any method to debug the code I can’t see what rho values are generated. Any help woud be greatly appreciated. Thanks.

Gert-Jan · March 27, 2012, 7:09am

I assumed your theta ranged from 0 to 180 degrees. With a range of -90 to 90, the sin() function becomes negative between -90 and 0 degrees, so rho can still become negative.

I don’t know what kind of CUDA-dialect your are using (Mathematica?) so I’m not sure what goes wrong. But maybe you have to copy arrayCosValues and arraySinValues explicitly to the GPU.

Bajengua · February 17, 2020, 2:52pm

Good morning

This post might be old but i would appreciate any advice.

I’m thinking to use the HoughTransform Algo to trace the contours of some important details on a printed Tif image and then compare them with the soft tif image to find any kind of errors in the printing procedure. I am using cuda to enhance the computations’ speed…
In this purpose, i would like to know if there is any built-in algo that does the job or should i start from scratch ?

Best regards,

Robert_Crovella · February 17, 2020, 2:58pm

May be of interest:

https://docs.opencv.org/master/d2/d15/group__cudaimgproc__hough.html

Bajengua · February 24, 2020, 11:58am

Hi Robert_Crovella

Sorry for the late answer, i’ve been trying to implement the houghTransform algorithm from scratch on Cuda and seems i’ve succeeded. Now, my goal is to compare the performance of my HoughTransform Kernel with the one implemented on OpenCV::Cuda.

I started by performing a simple canny filter but seems like i have some linker problems …

Here are my includes

// Includes

#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <opencv2/cudaimgproc.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/opencv.hpp>

#include <stdio.h>

Here is my function:

void applyCannyDevice(){
        Mat mat = imread("./Sudoku.jpg");

	Ptr<cuda::CannyEdgeDetector> canny = cuda::createCannyEdgeDetector(50, 100);
	cv::cuda::GpuMat edge;
	cv::cuda::GpuMat src(mat);
	canny->detect(src, edge);

	double t1 = getTickCount();
	for (int i = 0; i<10; i++)
	{
		canny->detect(src, edge);
	}
	double t2 = getTickCount();
	cout << "cuda time:" << (t2 - t1) / getTickFrequency() / 10 << endl; 
}

And i’m having this error problem:

Erreur LNK2001 symbole externe non résolu “struct cv::Ptr __cdecl cv::cuda::createCannyEdgeDetector(double,double,int,bool)” (?createCannyEdgeDetector@cuda@cv@@YA?AU?$Ptr@VCannyEdgeDetector@cuda@cv@@@2@NNH_N@Z)

Is there any additional Path/Library i need to add?

Linker input:
Additional dependencies:

opencv_imgcodecs420.lib
opencv_highgui420.lib
opencv_imgproc420.lib
opencv_core420.lib
cudart_static.lib …

I would appreciate any kind of help !

Thanks in advaance