signal processing with Pure Data porting Pure Data objects to the CUDA platform

Hello, this is my first post to the forum. I am taking on a project to extend Pure Data (Pd) signal processing routines by adding cuda implementations. The objective is to build a library of essential CUDA<->Pd extensions. Please help! LOL

And there’s some very good things about Pure Data that will work well with CUDA. The issue that I have right now involves data structures. Pure data has a very efficient memory management scheme, which handles how to cache data between dsp routines. It prevents unnecessary transfer of data by performing operations in-place where possible, and that’s a feature that I think would make for better performance.

The starting point is just to create CUDA enabled versions of the Pd objects for block multiplication and fft/ifft. I have to add objects for transferring data between host and device. The tricky part is figuring out how deeply embedded are memory and scheduling operations in pure data.

I haven’t attempted to create a “port” of software to run on a certain platform before. Any general tips on how to port software would be helpful.