Math function performance

Sorry I am not able to give a short working example (like I always did) for this problem I encountered. Here’s what I saw.

GPU: K40c / pgc++ 19.4 / Ubuntu 16.4
with the following header file content

#include <cmath>
#pragma acc routine(erff) seq
#pragma acc routine(erfcf) seq

An acc kernel compiled with “-std=c++11 -acc verystrict -Minfo=accel -O3 -g -ta=tesla:fastmath -c -fpic” flags took 30 ms to finish. There was a single erfcf(x) call in the source code and after we changed it to “1.0f - erff(x)” and it would only take 22 ms to finish.

Does it sound right?


Hi stw,

Does it sound right?

Sorry, I don’t know since I’ve never looked at the performance of these intrinsics. We are just calling the CUDA versions of these routines so you may look to see if there’s any materials on the CUDA performance.

I did a quick search but could find too much on it with this post showing a users own CUDA implementation of erff: While he didn’t see much performance improvement over the builtin erff, he did improve the accuracy.



My question was on the performance of float type ERFC VS. ERF, where ERFC is the complementary error function and is mathematically equivalent to (1 - ERF). Hope this clarified my original post.