I’m working on porting an algorithm from the CPU to GPU using CUDA, and I’m getting some problems.
The algorithm is simply:
- read buffer A, process, and place into 1 of 14 buffers
- merge all of the output buffers
I have got the first part working perfectly, the second part of the algorithm is failing. by using memory copies ( I cannot get any of the debuggers to run on the computer ), I can see that all pointers in the second part of the algorithm point to an empty structure. Is there some limitation with CUDA with regards to pointers? or am I doing something completely wrong?
The second algorithm has been attached, it will not run, but maybe you can see if I am doing something wrong.
I have copied the output from the first part of the algorithm to the host and ran it through the CPU version, the final output is exactly as it should be, to confirm the first part is working.
I apologise if I’ve missed some details, please let me know if there’s anything you need clarifying.
Thank you in advance.
Algorithm.cu (3.24 KB)