CUDA Beginner questions

Hi everyone !!
I’m Software Engineer and working with VS2005. I’m not familiar with CUDA.
We are developing real-time software.
We need to substantially reduce the algorithm computing time(solving a system of non-linear algebraic equations). The portion of code implementing said algorithm is presented below(this piece of code will run in a device).

for(int i=0; i < 10000; i++)
result[i] = MyAlgorithm(k[i]);

float MyAlgorithm(float); - The function includes my numerical algorithm and uses lightmat(Matrix class library) library also.

Question: can a NVIDIA-card(possibly Tesla D/C870) working with the CUDA be used as an accelerator capable of reducing computing time (each iteration in this loop will be done in parallel) ?
Thank you.

Depends on the MyAlgorithm function, how much memory access it needs and if/how much it can be parallelized. E.g. if it was just doing a[i] * k[i] then 10000 elements is too little to get any significant speedup. Actually, since you have to transfer k and result, such a simple operation would never be able to give a speedup unless you can do further processing on the GPU.

So to answer the question: How should we know, we have not even 1% of the information necessary to tell.

As a first step maybe calculate how many operations you will be doing per byte transfered, I think a factor of 100 should be the absolute minimum to be successful, but I do not have that much data.

Thank you for quick response.

Additional Information about MyAlgorithm() function:
Inside this function there are calling to sin,cos,exp,sqrt and etc.(standard math.h functions) and also optimization algorithm(steepest descent with linear search)
to find a local minimum of a function(numerical solution of non-linear equations.).

I’d thought to create a separate thread for each iteration. If it’s the way to reduce computing time.

Thank you again.

threads run parallel, so if an iteration is based on the outcome of the previous one, that will not work.

sin,cos,sqrt, and all those are very fast on GPU, but I think you need to study CUDA a bit more before you can decide how to utilize the massive parallelism of CUDA. As an example, you need to use thousands of threads in general in CUDA.