Scientific C++ to GPU?

I’m currently working with a rather large monte carlo program that is written in C++. I’ve been looking into CUDA stuff, but I’m still not quite sure about making the leap to convert it. There are certain loops that are very large and data-parallel that I would like to run on CUDA. There’s also a decent amount of FFTs, so I was also thinking of trying cufft. Is it possible to write kernels for those sections and just “stick it” into my existing code to replace those loops to get a feel for any speedup and make a decision based on that?

Sorry if this is a common question, I haven’t had much luck searching, but I’m probably just not going for the right stuff.

I was also looking into komrade, but the google code site isn’t letting me…

Yes, you can do incremental parallelization. moving just portion of the computation to the GPU.

Komrade is now called Thrust. Thrust is included in CUDA 4.0RC.

Cuda C is an extension to the language. So you could write the exact same code, never make any calls to the GPU and it will compile and run fine. Or you could take one part of the code and replace it with code that will run on the GPU but leave the rest of your original code the same. So, you can parallelize as much as you desire.