I am quite curious if there is any ODE solver now by CUDA?

As the post you quote says, there is little or no parallelism in typical scalar ODE solvers.

If your problem is du/dt = f(u) where u is a scalar, then no there aren’t and the GPU is a poor match for that sort of problem. On the other hand, large vector valued ODEs (like those arising from method of lines discreisation of parabolic ODEs, for example), are a different question, because those contain enough FLOPs to amortize most of the fixed overheads associated with the GPU. So if you have an ODE like du/dt = Au or du/dt=A(u)u where u is a large vector, then yes there are a number codes around you could use for that.

Thx for your reply.

Actually my case is not so complex, which looks like du/dt = Au, however, all of the variables are matrix. e.g.

dx/dt = A1*x + A2*y + A3*z

dy/dt = B1*x + B2*y + B3*z

dz/dt = C1*x + C2*y + C3*z

So, if there any existing codes for that? I am a newbie for it…

If your matrix is small, then the GPU won’t help. “small” here likely means something like N=200 or smaller.

Unless you have hundreds of independent problems, then maybe you could reorganize the work such that each block (or even each thread for very small N) does a problem.

In a lot of physics simulations, you have particle systems where N may be 1,000,000, but the coupling between particles is 0 unless they’re spatially local.

That is exactly what I want to say. In fact I am looking for the high performance solution for imaging or image processing. In such a situation, every pixel, or voxel, will be treated by the equation I mentioned before independently. And the number of pixels may be up to 256*256 = 65536 in the case of 2D and much more in 3D.

So what you really have is a linear system something like this:

```
| A1 A2 A3 || x |
dv/dt = Kv = | B1 B2 B3 || y |
| C1 C2 C3 || z |
```

where K is a block matrix, and (I am going to guess) sparse. As I said at the outset, there isn’t a “general purpose” ODE solver of the sort you seem to be looking for - most are implemented as part of a suite of application specific codes (like electromagnetism, fluid mechanics, etc). Having said that, implementing something yourself need not be all that difficult, depending on the exact nature of the problem.

You will find state of the art code for dealing with the RHS of that equation in either the Cusp or OpenNL libraries. What you do with the approximation of the LHS is an open question depending on the properties of the ODE and the accuracy you need. Usually you would leave the control logic and outer loop of the time stepping code on the host CPU and have the GPU do all the vector calculations. Low order explicit methods literally require a few tens lines of device code to implement.

I am looking for a parallel CUDA implementation of system of ODE solver. My ODE system is very large, more than 2000-30000 variables. I have dual GTX 295 in SLI in my corei7 975 machine with 24 GB RAM so I want to attack large scale problem like this. As you said “**there are a number codes around you could use for that**” please direct me a bit. I will be really glad to get some information on this.

I am looking for a parallel CUDA implementation of system of ODE solver. My ODE system is very large, more than 2000-30000 variables. I have dual GTX 295 in SLI in my corei7 975 machine with 24 GB RAM so I want to attack large scale problem like this. As you said “**there are a number codes around you could use for that**” please direct me a bit. I will be really glad to get some information on this.