Sorry I am not able to give a short working example (like I always did) for this problem I encountered. Here’s what I saw.
GPU: K40c / pgc++ 19.4 / Ubuntu 16.4
with the following header file content
#include <cmath> #pragma acc routine(erff) seq #pragma acc routine(erfcf) seq
An acc kernel compiled with “-std=c++11 -acc verystrict -Minfo=accel -O3 -g -ta=tesla:fastmath -c -fpic” flags took 30 ms to finish. There was a single erfcf(x) call in the source code and after we changed it to “1.0f - erff(x)” and it would only take 22 ms to finish.
Does it sound right?