Is there a way to solve a system of ODE’s in CUDA? I don’t need to solve an ODE in parallel for a given set of initial conditions, but I need to solve a series of similar systems of ODE’s for a bunch of different initial conditions. So if I can run a loop with some ODE solver inside, I could probably speed up my code significantly. It’s in MATLAB right now, but I’ll switch to Scipy/pycuda if I can get this functionality.
If you can write C code to do it, and it isn’t extremely heavy on memory usage, you can write a CUDA kernel to do the same thing with 10,000+ separate instances easily. Check out the programming guide for more info.
There is hardly any parallelism in MATLAB ode solvers. These are one-step or few-step solvers with a lot of feedback and adjustment after computing value for each single point. With this kind of code, each multiprocessor in GPU (warp) would not even be fully utilized. However, if you spread computation for several different trials with the same code across several multiprocessors (blocks), it would be possible to get speedup.
I am looking for an ODE solver that has some more parallelism so I could port it to GPU. Has any of you looked at CVODE, or what used to be called PVODE (ODE solver for parallel machines)? If I dont find anything, I will have to look at trying to run several different trials with the same code in parallel on GPU.