It’s hard to say without an example and could be due to multiple reasons.
When using just “-ta=tesla” or “-acc”, the compiler is actually creating multiple versions of the GPU which can increase the time. If you know the target device, it may help to using “-ta=tesla:ccXX” where “ccXX” is the compute capability of your particular device.
You can try adding the “-time” flag which will display compilation timing stats. However, only the PGI compiler is instrumented so if the extra time is due to the back-end CUDA compiler, it wont show here.
Another thing to try is using “-v RUN=/usr/bin/time” on the command line. “-v” is the verbose flag where you can see all the steps the compiler driver makes to compile your code. The “RUN” option will prepend all the driver commands with the time utility so you can see which commands are taking the most time.