Specific algorithm; should we use GPU?

Hello all,
I would appreciate if any one could give me some advice on how well-suited GPU Computing is to solve our problem.

Somewhat simplified it could be stated as:
Given a velocity field: V(x,y,z),
And N particles with the starting positions: (x0i,y0i,z0i), i=1…N;
Find the trajectory of each particle and write it to file: TRAJi.txt, i=1…N.

I was thinking of having V(x,y,z) in shared memory and one thread per particle:
global void traceOneParticle(float *V, float *x0, float *y0, float *z0);

The tracing of each particle is computing expensive; it requires numerical integration,
some if else tests, multiplication of a vector with a small matrix etc.
New memory may have to be allocated dynamically within the thread.
Is this a problem?

How about the writing of results to file;
Is it possible to have each thread writing to its own file descriptor in parallell?

As a rough estimate assume I have 1000 particles. The velocity modell requires 1GB and each particle requires 8MB.
This should not pose a problem on a Tesla S 1070, since it has 16GB dedicated memory?
Or is there a limit on how much memory can be allocated within each thread?

Thanks in advance for any answers!
Andreas Werner Paulsen

The GPU threads can’t write to a file. The host CPU must do that.

Memory allocation in GPU space must be done on the host CPU as well, unless you implement your own memory pooling system on the GPU (which may be difficult!)

For some basic implementation ideas, look at the FluidsGL SDK sample. It uses a velocity field (generated from a Navier Stokes solver) and moves thousands of particles along that velocity field very efficiently. It is a 2D simulation and applies a wraparound at the border.