i am struck at compiling a large codebase(c/cpp) to gpu. It is not possible to add device host qualifier to all functions. I have identified some performance bottleneck code which can be done efficiently in parallel on GPU but porting that too requires lot of helper functions to be ported along with it. Isolating some code as standalone is also laborious and most importantly not scalable to add further functionality.
Can anyone help to provide some inputs to deal with this situation.
thank you for help.