I’m trying to figure out some timings… Can anyone give me some information on the typical ratio of design to implementation of porting an algorithm to CUDA, and also the total time taken to get a working version on a GPU?
Hoping someone more experienced can help
PM me if you prefer
Design time is a tricky beast and highly dependent on a number of variables.
Writing your own fast CUDA kernels that fit your algorithm can take a good chunk of time, even several months for the toughest problems. But, luckily, there are libraries which already contain very fast CUDA kernels for you to plug-n-play into your code. ArrayFire for CUDA is the one that I work on. Most ArrayFire users can get pretty good speedups in a few hours of download -> read getting started material -> run examples -> start modifying examples to do your algorithm -> plug working version back into your main code.
If you have any questions, feel free to email me directly (provided below).