Math function performance

oukore · September 23, 2019, 11:16pm

Sorry I am not able to give a short working example (like I always did) for this problem I encountered. Here’s what I saw.

GPU: K40c / pgc++ 19.4 / Ubuntu 16.4
with the following header file content

#include <cmath>
#pragma acc routine(erff) seq
#pragma acc routine(erfcf) seq

An acc kernel compiled with “-std=c++11 -acc verystrict -Minfo=accel -O3 -g -ta=tesla:fastmath -c -fpic” flags took 30 ms to finish. There was a single erfcf(x) call in the source code and after we changed it to “1.0f - erff(x)” and it would only take 22 ms to finish.

Does it sound right?

Thanks.

MatColgrove · September 24, 2019, 3:11pm

Hi stw,

Does it sound right?

Sorry, I don’t know since I’ve never looked at the performance of these intrinsics. We are just calling the CUDA versions of these routines so you may look to see if there’s any materials on the CUDA performance.

I did a quick search but could find too much on it with this post showing a users own CUDA implementation of erff: Optimized version of single-precision error function, erff() - CUDA Programming and Performance - NVIDIA Developer Forums. While he didn’t see much performance improvement over the builtin erff, he did improve the accuracy.

-Mat

oukore · September 24, 2019, 5:49pm

Thanks.

My question was on the performance of float type ERFC VS. ERF, where ERFC is the complementary error function and is mathematically equivalent to (1 - ERF). Hope this clarified my original post.

Topic		Replies	Views
Optimized version of single-precision error function, erff() CUDA Programming and Performance	21	4656	December 25, 2017
Something horrbily wrong with ERF(x) Legacy PGI Compilers	4	12218	April 28, 2011
Error while using erf and erfc function Legacy PGI Compilers	2	4870	October 20, 2010
erf and erfc for windows using pgcpp Legacy PGI Compilers	2	5434	September 22, 2006
An accuracy-optimized performance-competitive implementation of `erfcf` GPU-Accelerated Libraries	1	539	August 2, 2022
Fastmath functions Speed or accuracy CUDA Programming and Performance	8	21637	April 16, 2009
An accuracy-optimized performance-competitive implementation of erfcf() CUDA Programming and Performance	0	749	August 2, 2022
Compilation time Legacy PGI Compilers	2	2602	October 26, 2010
erf function Legacy PGI Compilers	13	15664	March 12, 2008
Problems of fortran functions erfc(x) and conjg(complex). Legacy PGI Compilers	2	9003	March 30, 2010

Math function performance

Related topics