std::transform_reduce() without std::execution are missing

In PGI 19.10 community edition, some overloaded functions of std::transform_reduce() are missing while PGI claims C++17 are fully supported.
More specifically, implementations of (1),(2), and (3) in the following website are missing.
https://en.cppreference.com/w/cpp/algorithm/transform_reduce

Progam I compiled is …

#include <iostream>
#include <vector>
#include <numeric>

int main()
{
  const std::vector<int> v1 = {1, 2, 3, 4, 5};
  const std::vector<int> v2 = {2, 3, 4, 5, 6};

  // (1) : 2つのリストを集計する
  // sum1 = 1*2 + 2*3 + 3*4 + 4*5, 5*6
  int sum1 = std::transform_reduce(v1.begin(), v1.end(), v2.begin(), 0);
  std::cout << "sum1 : " << sum1 << std::endl;

  // (2) : 2つのリストを集計する。
  // リストを集計する2項演算と、2つのリストの要素を掛け合わせる2項演算を指定する
  int sum2 = std::transform_reduce(v1.begin(), v1.end(), v2.begin(), 0,
                                  [](int a, int b) { return a + b; },  // 集計関数
                                  [](int a, int b) { return a * b; }); // 2つの要素を合成する関数
  std::cout << "sum2 : " << sum2 << std::endl;

  // (3) : リストの各要素を変換しながら集計する
  // 1*2 + 2*2 + 3*2 + 4*2 + 5*2
  int sum3 = std::transform_reduce(v1.begin(), v1.end(), 0,
                                   [](int acc, int i) { return acc + i; }, // 集計関数
                                   [](int x) { return x * 2; });           // 変換関数
  std::cout << "sum3 : " << sum3 << std::endl;
}

And, PGI compiler spits the error described below.

"./cpprefjp_transform_reduce.cpp", line 14: error: no instance of overloaded
          function "std::transform_reduce" matches the argument list
            argument types are: (__gnu_cxx::__normal_iterator<const int *,
                      std::vector<int, std::allocator<int>>>,
                      __gnu_cxx::__normal_iterator<const int *,
                      std::vector<int, std::allocator<int>>>,
                      __gnu_cxx::__normal_iterator<const int *,
                      std::vector<int, std::allocator<int>>>, int)
        int sum1 = std::transform_reduce(v1.begin(), v1.end(), v2.begin(), 0);

Note that, when I added an execution policy (std::execution::seq or par) to the first argument, functions are working as I expected.

Best,
Miya

Hi Miya,

Not unexpected. In order to gain object compatibility with g++, we need to use their STL. Given g++ 9.2 doesn’t support this in their STL, we unfortunately don’t as well. The easy work around is to include a seq execution parameter. For example:

% cat transform_reduce.cpp
#include <iostream>
#include <vector>
#include <numeric>
#include <execution>

int main()
{
  const std::vector<int> v1 = {1, 2, 3, 4, 5};
  const std::vector<int> v2 = {2, 3, 4, 5, 6};

  int sum1 = std::transform_reduce(std::execution::seq, v1.begin(), v1.end(), v2.begin(), 0);
  std::cout << "sum1 : " << sum1 << std::endl;

  int sum2 = std::transform_reduce(std::execution::seq, v1.begin(), v1.end(), v2.begin(), 0,
                                  [](int a, int b) { return a + b; },
                                  [](int a, int b) { return a * b; });
  std::cout << "sum2 : " << sum2 << std::endl;

  int sum3 = std::transform_reduce(std::execution::seq, v1.begin(), v1.end(), 0,
                                   [](int acc, int i) { return acc + i; },
                                   [](int x) { return x * 2; });
  std::cout << "sum3 : " << sum3 << std::endl;
}
% pgc++ -std=c++17 transform_reduce.cpp; a.out
sum1 : 70
sum2 : 70
sum3 : 30

Hope this helps,
Mat

I don’t see any differences in using different policies with the PGI compiler. This is a slightly modified code example of the thread.

255566 513997 462758 466801
247111 513928 462873 467035
252264 518074 462592 466330

#include <iostream>
#include <vector>
#include <numeric>
#include <execution>
#include <chrono>

template<typename Policy>
float run (size_t N,Policy policy) {

    auto t0 = std::chrono::high_resolution_clock::now();
    std::vector<float> v1(N), v2(N);
    for( unsigned i=0; i<N; i++ )
        v1[i] = v2[i] = i;
    
    auto t1 = std::chrono::high_resolution_clock::now();
    auto sum1 = std::transform_reduce(policy, v1.begin(), v1.end(), v2.begin(), 0);

    auto t2 = std::chrono::high_resolution_clock::now();
    auto sum2 = std::transform_reduce(policy, v1.begin(), v1.end(), v2.begin(), 0,
                                  [](auto a, auto b) { return a + b; },
                                  [](auto a, auto b) { return a * b; });

    auto t3 = std::chrono::high_resolution_clock::now();
    auto sum3 = std::transform_reduce(policy, v1.begin(), v1.end(), 0,
                                   [](auto acc, auto i) { return acc + i; },
                                   [](auto x) { return x * 2; });

    auto t4 = std::chrono::high_resolution_clock::now();

    auto d = [] (auto a,auto b) {return std::chrono::duration_cast<std::chrono::microseconds>(b-a).count();};
    std::cout << d(t0,t1) << " " << d(t1,t2) << " " << d(t2,t3) << " " << d(t3,t4) << "\n";
    return sum1+sum2+sum3;
}

int main(void) {
    constexpr unsigned N = 1024*1024*100;
    run(N,std::execution::seq);
    run(N,std::execution::par);
    run(N,std::execution::par_unseq);
    return 0;
}

Correct, the parallel execution model is still in development and will be released as a beta feature early next year, with the production release later. It will also include implicit offload to the GPU. Multicore CPU with use TBB (like g++ does).

For a preview of the GPU enabled C++ standard language parallelism, please see David Olsen’s talk from SC19:
https://on-demand.gputechconf.com/supercomputing/2019/video/sc1936-gpu-programming-with-standard-c++17/

-Mat

Hi mkcolg,

Thank you for your kind reply and explanations :)
I also found the same thing on g++ as you have mentioned.

Although the reason comes from g++, I hope that the PGI will correctly support std::transform_reduce().

Best,
Miya