I’m trying to speed-up my code with openacc.
So I want to profile my code in source level.
I’m using ‘nvvp’ profiler from CUDA 7.0
When I run nvvp, I can use ‘analysis tap’ and can get which latency slows my code. (data dependency, conditional branch and bandwidth… etc)
But, I couldn’t get line-based analysis, but only ‘kernel’ level analysis.
(e.g. main_300_gpu kernel used 10s)
Is there any way to profile my code in source-level?
I’m using
PGI 15.7 (using pgcc)
CUDA 7.0
GTX 960
Ubuntu 14.04 LTS x86_64