I’m at the stage of my port where I’m trying to optimise my code to get the best possible speed-up. The code is currently running slower on the GPU than it was on a single CPU.
Are there any profiling tools available so I can pinpoint the bottlenecks in my code? I know there is one C for CUDA but I’m unsure of any tools available from PGI. Any insight would be much appreciated.