Matlab CUDA Toolbox for Signal and Image Processing

I’m currently working with some mex-functions that runs CUDA code that is called from Matlab. Many that work with signal and image processing use Matlab every day and would benefit from the GPU power for filtering and other things. I’ve tested the Jacket software but I was not very impressed since it is very limited in for example convolution (their convn function could for example only use 3 x 3 x 3 filters).

I’ve just managed to pass on device pointers between different mex-files, such that the data does not need to be passed between the GPU and the CPU all the time, but only when a copy-function is called from Matlab. This would make the programming really easy from Matlab and at the same time take advantage of the full power of the GPU, the problem is otherwise that the performance is greatly decreased by copying the data back and forth.

Is there any interest of this kind of toolbox? I will probably do it anyway for myself, but if I know that others would use it I would put more work into making it more general and well documented. I would not charge any money for this toolbox, or maybe a small fee like 10 dollars.

Hi wanderine,

Thanks for the post. You’re right, filtering is a sweet spot for GPU computing right now - very powerful.

We at AccelerEyes have developers focused on extending Jacket’s set of filtering functions to handle ND kernel sizes, ND data dims, and all data types, (i.e. single, double, logical, etc as well as complex and real). We are also working to ‘gfor’ enable these functions. In our recent Jacket 1.1 release, we extended the kernel sizes to up to 10x10 for conv2. We’ll update convn in the next release. Currently, Jacket partially supports, filter, filter2, conv, conv2, & convn.

BTW, if you have code that does this already, we’d love to chat with you to integrate. In general, we are happy to chat with people that have CUDA code that could be used to extend Jacket. We are in the mode of trying to put as much functionality into Jacket as quickly as possible and are looking for ways to scale by leveraging CUDA code that exists in the community. We’re happy to do deals with people where they make sense.



Hi, yes I’ve made the following mex-files so far

conv2_gpu, conv2_fft_gpu, conv3_gpu, conv3_fft_gpu

that you call like this in Matlab

filter_response = conv2_gpu(image,filter)

and they all work for arbitrary image sizes and filter sizes, real valued or complex valued, the speed up is about 30 including transfer
times to and from the GPU (we have made our own mex-file for conv3 since Matlab’s convn is really slow, conv2 is already a mex-file in standard Matlab)
and about 50 without the transfer time (Intel Core 2 Quad 2,66 GHz versus Nvidia GTX 285).

I’m currently working on making these functions more general, such that you for example can use

[filter_response_1, filter_response_2, filter_response_3, filter_response_4] = conv2_gpu(image, filter_1, filter_2, filter_3, filter_4)

such that the image to be filtered does not have to be sent to the GPU 4 times if it will be filtered with more than one filter.

A better approach might be to just pass the pointers to the functions, and then copy back the data when it is ready, I’ve made
the functions

device_pointer = copy_data_to_gpu(data)


data = copy_data_from_gpu(device_pointer, data_dimensions)

as mex-files and they seem to work, the idea is to make the conv-functions so general that they understand if you send an image or just a pointer to the image.

I’m currently also working on a conv4_gpu for 4D convolution

If you are interested you can contact me at