This maybe a difficult question to answer but how does one write safe complex parallel code with CUDA?
I am utilising cudaStreamAddCallback and multiple GPU and streams to drive maximum efficiency with a multi-GPU setup of NVidia Titans and will expand to dynamic parallelism usage. It gets very hard for me to figure out all the things that might cause race conditions or multiple threads overwriting values or memory leaks etc.
So I am looking for guidelines (tools, error handling, references etc.) on what to do to help manage this complexity and to minimize anything that might creep out in production code? So yes the question maybe a bit too open ended.
Thanks for any help,