pgc++ is slower than g++ around an empty constructor


I’m trying to optimize performance for my program.
But, I have a trouble on some part of my program.
Compiled program by pgc++ -O3 is much slower than g++ -O3 on the specific code.
Cut-down example for this issue is below.

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

template<typename T>
class mycl{
        T v;
        mycl( ) {}

int main()
        struct timeval s,t;
        mycl<int> *tmp = new mycl<int>[22566659];
        printf("  Processing time: %d (msec)\n", (t.tv_sec - s.tv_sec)*1000 + (t.tv_usec - s.tv_usec)/1000);

In my environment, execution result is below.

$ g++ -O3 sample1.cpp
$ ./a.out
  Processing time: 0 (msec)
$ pgc++ -O3 sample1.cpp
$ ./a.out
  Processing time: 38 (msec)

Generated program by pgc++ -O3 calls an empty constructor function for mycl.
On the other hand, generated program by g++ -O3 optimize-away the constructor function for mycl.
This seems to be a reason why pgc++ -O3 is slower than g++ -O3 in this sample code.

Are there any good compile option to optimize performance with pgc++ in this sample?

I’m using pgc++ 17.4 on RedHat 6.9 environment.


Try using this timer
% more dclock_64.s

.file   "dclock-hammer.s"
        .align    8

# .clock:  .double 0.0000000005          # 2.0 GHz
# .clock:  .double 0.000000000455        # 2.2 GHz
# .clock:  .double 0.000000000417        # 2.4 GHz
 .clock:  .double 0.000000000376        # 2.66 GHz
# .clock:  .double 0.000000000357        # 2.8 GHz
# .clock:  .double 0.0000000003333       # 3.0 GHz
# .clock:  .double 0.0000000003125       # 3.2 GHz
# .clock:  .double 0.0000000002777       # 3.6 GHz
.low:   .long 0x00000000
.high:  .long 0x00000000
        .globl   _DCLOCK, dclock, _dclock, _dclock_, dclock_
        .byte   0x0f, 0x31

        movl    %eax, .low(%RIP)
        movl    %edx, .high(%RIP)

        fildll  .low(%RIP)
        fmull   .clock(%RIP)
        fstpl   -24(%rsp)
        movsd   -24(%rsp), %xmm0
#include <stdio.h> 
#include <stdlib.h> 

extern "C" double dclock(void);
 template<typename T> 
 class mycl{ 
         T v; 
         mycl( ) {} 

int main() 
          double time1,time2,time;

        mycl<int> *tmp = new mycl<int>[22566659]; 
          time=time2 - time1;
          printf("  Processing time: %lf (sec)\n", time);

% g++ -o s_gnu s.cpp ./dclock_64.s
% pgc++ -o s_pgi s.cpp ./dclock_64.s
% s_pgi
Processing time: 0.068134 (sec)
% s_gnu
Processing time: 0.054925 (sec)

If you run it several hundred times and determine g++ is faster
than pgc++ on average, I still do not think you can generalize about overall



Thank you for your response.
This performance difference only occurs on -O3 optimization level.
g++ -O3 can optimize-away calling an empty constructor for mycl.
But, pgc++ -O3 cannot.

The result with your code and -O3 is below.

$ g++ -O3 -o s_gnu s.cpp ./dclock_64.s
$ pgc++ -O3 -o s_pgi s.cpp ./dclock_64.s
$ ./s_gnu
Processing time: 0.000097 (sec)
$ ./s_pgi
Processing time: 0.050937 (sec)

If we don’t use -O3 on g++, g++ also cannot optimize-away calling the constructor for mycl.

$ g++ -o s_gnu_default s.cpp ./dclock_64.s
$ ./s_gnu_default
Processing time: 0.046150 (sec)

I’m looking for an option setting for pgc++ to optimize-away this needless constructor call.