i am engaged in a signal-processing project using CUDA for its high speed and i find it extremely difficult and complex because there’s over one hundred separate variables as parameters and many arrays ,couters,and the most important , input and output files about 50MBs. and the kernal code in CPU vision is about 600 lines.can someone provide some ideas about:
- how to deal with so many separate variables , arrays and counters in host and device and what about the relationship between parameters and input flows.
- how to control the input and output flows in device to suit CUDA processing
- how to make it easy to deal with the kernal in device with lots of “for” and “if” sentences
any idea will be beneficial to me and thank you all.