Hi all. I’ve implemented a variant of the Canny edge detector using CUDA 1.1 and updated for 2.0 beta. It uses some nice separable convolution functions for 3x3 Sobel and variable size Gaussian kernels that are a bit different from the SDK version although many of the principles were kept. Also, I’ve implemented an interesting function to account for hysteresis processing between data from disparate thread blocks. Most of the optimization were made during 1.1 and geared towards coalesced read/writes and achieving a high occupancy ratio using few registers and minimizing shared memory.
I’ve submitted two versions:
One version can be compiled for Matlab use into a .mexw32, the other interfaces with GLUT and openCV so you need those respective libraries. I believe the ladder accepts various image formats from jpg and tif to avi movie format so long as width and height ratios are multiples of 16.
A paper was also written about the findings. Some general testing showed a significant speedup 60x+ of the CUDA version over Matlab’s edge function during CUDA 1.1.
Feel free to modify, comment the code.
Undergraduate, UMIACS at the University of Maryland College Park