Can anyone point me to tutorial or examples of how I can port my existing C/C++ threads to CUDA cores? Also is there specification how big, program code, not data, should be for optimal porting to CUDA ?
Most of the examples are on how to distribute the processing on big data set. Rather than distributing threads into Cores, sorry if I’m not making sense.
Thanks in advance.