Can someone provide some ideas for my CUDA project a signal-processing project using CUDA

Hi, everyone:
i am engaged in a signal-processing project using CUDA for its high speed and i find it extremely difficult and complex because there’s over one hundred separate variables as parameters and many arrays ,couters,and the most important , input and output files about 50MBs. and the kernal code in CPU vision is about 600 lines.can someone provide some ideas about:

  1. how to deal with so many separate variables , arrays and counters in host and device and what about the relationship between parameters and input flows.
  2. how to control the input and output flows in device to suit CUDA processing
  3. how to make it easy to deal with the kernal in device with lots of “for” and “if” sentences

any idea will be beneficial to me and thank you all.

When approaching any large task, it is usually helpful to break it down into several smaller and more manageable tasks. Some tasks may be more suited for GPU processing than others, and you may want to consider different approaches to your implementation. Since you are working with large data sets, you should also consider what makes sense in terms of balancing data transfers vs processing.

Thanks. as far as the speed of the CPU version is so slow that i have to do the job with 9 servers combine and every one of which has 4 CPUs. That’s too EXPENSIVE!

So i am considering to transplant the project into GPU vision.

I am still wondering how to divide the grid,block,threads of such big input ,which is my biggest problem.

Dividing the input depends on your algorithm. Is each element of your input processed independently? If so, then you can assign one thread per element. If your task is spatial in nature, like a convolution, CUDA’s SDK has a good example of how to use shared memory to deal with these tasks. Read the programming guide and the other SDK examples on how to approach different types of tasks.