I have two questions about how to map some ideas from CPU to GPU
The fact that CUDA does not support recursive function limit the ability to use CUDA solve many problems, like that related to tree that the tree traversal normally perform in recursive manner. So i wonder what is the common strategy that people use to solve this problem.
One way to solve recursive is to use the stack. I don’t know what is the efficient way to build the stack in CUDA. Some one say i should use share memory, but since there’s no thread lock
in CUDA, how different threads can access and update stack concurrently.
Is there any reference about how to build in CUDA especially with shared memory, because global memory may be too slow with some thing we frequently access.
Any idea is appreciate, i’m a novice in this field and and want to improve my understanding about cuda