I’m currently working with a rather large monte carlo program that is written in C++. I’ve been looking into CUDA stuff, but I’m still not quite sure about making the leap to convert it. There are certain loops that are very large and data-parallel that I would like to run on CUDA. There’s also a decent amount of FFTs, so I was also thinking of trying cufft. Is it possible to write kernels for those sections and just “stick it” into my existing code to replace those loops to get a feel for any speedup and make a decision based on that?
Sorry if this is a common question, I haven’t had much luck searching, but I’m probably just not going for the right stuff.
I was also looking into komrade, but the google code site isn’t letting me…