Exploring power of CUDA

Hi,
I have been working on CUDA for past one year and am stuck at a few problems. The major issue I believe would be solved if I am able to get some good CUDA running examples. i.e. which are of very extensive computation and take minutes or even hours to compute.

I have searched but was unable to find such examples. It would be really helpful if someone could direct me to such examples (even as simples as squaring an array, but taking minutes to compute(I have a TeslaC2070), also using multi dimensional grids,blocks and threads).

I have made an image recognition program using euclidian classifier method on CUDA but it is able to run on only 130 images at a time. ( image size is definitely less than memory available, but the problem is that I reach 65535x1024). I also want to know if this limit could be exceeded by using 2D grids and blocks? and if not, will I have to code for batch processing specifically for each code or is there any library which handles this automatically.

I realize that I might be unclear in explaining my problem but in short, I want to utilize whole memory of CUDA and try to code or preferably find an example first which takes hours to compute.(I have a 5GTX460 and 1 Tesla C2070 to run and after practicing small codes like FFT,DWT,Euclidian etc, I want to be able to reach the power on which CUDA is being used in the industry).
Please ask me again if you need any more detail to my question.
thanks in advance.

Try FFT of a 500x500x500 matrix 1 mil times.

Could you please give me some real world examples where this FFT might be of use. I mean I am unable to see the use of the FFT of same matrix 1mil times.

When solving time dependent partial differential equations. A usual tdpde is composed of a term which has sum of space derivatives like \nabla^(2p)f(\vec r) and nonlinear term like some f^n(>2). If the linear part has lots of derivative is beneficial to evaluate it in k space where each i\vec k corresponds to \nabla and it is local in k. So in order to find the function f(\vec r) at time t+delta t. One has to do at least 2 FFT. In practice we have many times steps with at least 2 FFT for each step. Depending on the problem I had between 50000 and 10 000 000. One FFT might take less than a second even for large matrices, but when I do it for many times I welcome a speed up of 60 to 100 at least. Recently I did one those calculations on a Tesla card and finished the task in 10 days. It would have taken me 1.5 years to finish the task.

Another application I did was Monte Carlo simulation of particles with quasi-long range interaction. I got a speed up of 50. This helps a lot since now I can get results in 1 week.

On some page on nvidia cuda zone there are a list of application and the relative speed-up. Most of them are using it for real application, research or MRI. If a program would take only a few days to run, nobody would have bothered to port it to CUDA.

thanks for your reply. that was very helpful. I am a final year student in B.Tech. CSE. I’ve been trying to look for problems which take months to solve but couldnt find any. The only applications which I was able to find and code were Pi precision value generator and prime no generator.

I have gone through the CUDA in development page but considering my poor knowledge base, I believe they are a bit too vague. eg: I can see that it is used in Computer Vision, but going through the nvidia’s page does not help me with any application idea which would take weeks to run. Could you direct me to some page or if possible explain me the discipline regarding such applications. I’ll get started on FFT alongside, but will have to do some search myself on pde’s. It would be really helpful if you could give some similar kind of project ideas as my final year project is 3D face recognition using stereoscopic vision (and CUDA wherever needed for speedup), which is not the kind of application which would take a long time to run.

Computer simulations (i.e. Molecular dynamics simulations in HOOMD-blue http://codeblue.umich.edu/hoomd-blue) are often run for weeks. A typical research paper in our field (http://sitemaker.umich.edu/glotzergroup/home) contains data from thousands of runs, each lasting a week or more.

Microfluidics, fluid dynamics are a big thing. They are useful in physics, chemistry and even blood flow.