Am I right when I say that in CUDA, to achieve the optimum code, we have to play around with different kind of possible optimizations. I am asking this because while coding in C, this was not as much the case and so it was easier to have a design setup before we started to code.
The problem with this “trial and error” kind of approach is that it is very hard to come up with a design before we begin coding, and I have had some struggle due to that.