CUDA Image Processing Demo

CUDA Sobel Edge Detector demo

Thanks for posting this. You didn’t provide any source code – are you planning to share it? I ran it with our internal profiling tools and I fear that your implementation is not very efficient. Your computation time seems to be dominated by uncoalesced global loads and stores (see the programming guide), and you also have divergent branches. My hunch is that you aren’t using shared memory – is that correct?

If you post the kernel code I can provide some tips to improve performance.


thanks Mark, source code is here :