# std::transform_reduce() without std::execution are missing

In PGI 19.10 community edition, some overloaded functions of std::transform_reduce() are missing while PGI claims C++17 are fully supported.
More specifically, implementations of (1),(2), and (3) in the following website are missing.
https://en.cppreference.com/w/cpp/algorithm/transform_reduce

Progam I compiled is …

``````#include <iostream>
#include <vector>
#include <numeric>

int main()
{
const std::vector<int> v1 = {1, 2, 3, 4, 5};
const std::vector<int> v2 = {2, 3, 4, 5, 6};

// (1) : 2つのリストを集計する
// sum1 = 1*2 + 2*3 + 3*4 + 4*5, 5*6
int sum1 = std::transform_reduce(v1.begin(), v1.end(), v2.begin(), 0);
std::cout << "sum1 : " << sum1 << std::endl;

// (2) : 2つのリストを集計する。
// リストを集計する2項演算と、2つのリストの要素を掛け合わせる2項演算を指定する
int sum2 = std::transform_reduce(v1.begin(), v1.end(), v2.begin(), 0,
[](int a, int b) { return a + b; },  // 集計関数
[](int a, int b) { return a * b; }); // 2つの要素を合成する関数
std::cout << "sum2 : " << sum2 << std::endl;

// (3) : リストの各要素を変換しながら集計する
// 1*2 + 2*2 + 3*2 + 4*2 + 5*2
int sum3 = std::transform_reduce(v1.begin(), v1.end(), 0,
[](int acc, int i) { return acc + i; }, // 集計関数
[](int x) { return x * 2; });           // 変換関数
std::cout << "sum3 : " << sum3 << std::endl;
}
``````

And, PGI compiler spits the error described below.

``````"./cpprefjp_transform_reduce.cpp", line 14: error: no instance of overloaded
function "std::transform_reduce" matches the argument list
argument types are: (__gnu_cxx::__normal_iterator<const int *,
std::vector<int, std::allocator<int>>>,
__gnu_cxx::__normal_iterator<const int *,
std::vector<int, std::allocator<int>>>,
__gnu_cxx::__normal_iterator<const int *,
std::vector<int, std::allocator<int>>>, int)
int sum1 = std::transform_reduce(v1.begin(), v1.end(), v2.begin(), 0);
``````

Note that, when I added an execution policy (std::execution::seq or par) to the first argument, functions are working as I expected.

Best,
Miya

Hi Miya,

Not unexpected. In order to gain object compatibility with g++, we need to use their STL. Given g++ 9.2 doesn’t support this in their STL, we unfortunately don’t as well. The easy work around is to include a seq execution parameter. For example:

``````% cat transform_reduce.cpp
#include <iostream>
#include <vector>
#include <numeric>
#include <execution>

int main()
{
const std::vector<int> v1 = {1, 2, 3, 4, 5};
const std::vector<int> v2 = {2, 3, 4, 5, 6};

int sum1 = std::transform_reduce(std::execution::seq, v1.begin(), v1.end(), v2.begin(), 0);
std::cout << "sum1 : " << sum1 << std::endl;

int sum2 = std::transform_reduce(std::execution::seq, v1.begin(), v1.end(), v2.begin(), 0,
[](int a, int b) { return a + b; },
[](int a, int b) { return a * b; });
std::cout << "sum2 : " << sum2 << std::endl;

int sum3 = std::transform_reduce(std::execution::seq, v1.begin(), v1.end(), 0,
[](int acc, int i) { return acc + i; },
[](int x) { return x * 2; });
std::cout << "sum3 : " << sum3 << std::endl;
}
% pgc++ -std=c++17 transform_reduce.cpp; a.out
sum1 : 70
sum2 : 70
sum3 : 30
``````

Hope this helps,
Mat

I don’t see any differences in using different policies with the PGI compiler. This is a slightly modified code example of the thread.

255566 513997 462758 466801
247111 513928 462873 467035
252264 518074 462592 466330

``````#include <iostream>
#include <vector>
#include <numeric>
#include <execution>
#include <chrono>

template<typename Policy>
float run (size_t N,Policy policy) {

auto t0 = std::chrono::high_resolution_clock::now();
std::vector<float> v1(N), v2(N);
for( unsigned i=0; i<N; i++ )
v1[i] = v2[i] = i;

auto t1 = std::chrono::high_resolution_clock::now();
auto sum1 = std::transform_reduce(policy, v1.begin(), v1.end(), v2.begin(), 0);

auto t2 = std::chrono::high_resolution_clock::now();
auto sum2 = std::transform_reduce(policy, v1.begin(), v1.end(), v2.begin(), 0,
[](auto a, auto b) { return a + b; },
[](auto a, auto b) { return a * b; });

auto t3 = std::chrono::high_resolution_clock::now();
auto sum3 = std::transform_reduce(policy, v1.begin(), v1.end(), 0,
[](auto acc, auto i) { return acc + i; },
[](auto x) { return x * 2; });

auto t4 = std::chrono::high_resolution_clock::now();

auto d = [] (auto a,auto b) {return std::chrono::duration_cast<std::chrono::microseconds>(b-a).count();};
std::cout << d(t0,t1) << " " << d(t1,t2) << " " << d(t2,t3) << " " << d(t3,t4) << "\n";
return sum1+sum2+sum3;
}

int main(void) {
constexpr unsigned N = 1024*1024*100;
run(N,std::execution::seq);
run(N,std::execution::par);
run(N,std::execution::par_unseq);
return 0;
}
``````

Correct, the parallel execution model is still in development and will be released as a beta feature early next year, with the production release later. It will also include implicit offload to the GPU. Multicore CPU with use TBB (like g++ does).

For a preview of the GPU enabled C++ standard language parallelism, please see David Olsen’s talk from SC19:
https://on-demand.gputechconf.com/supercomputing/2019/video/sc1936-gpu-programming-with-standard-c++17/

-Mat

Hi mkcolg,