Conditional compilation of CPU/GPU code with nvc++

Yes, though I was hoping you wouldn’t ask ;-). The example I originally wrote for you used it, but it triggered a compiler bug when used inside the offloaded operator. So after the bug is fixed (for tracking it’s filed under TPR#30946), you can do something like the following example.

“if target” is our replacement for CUDA’s “CUDA_ARCH” macro which can’t be used with nvc++ since it’s a single pass compiler. nvcc takes two passes and splits the code into separate device and host versions while nvc++ generates the device and host version in the back-end. Full details can be seen in Bryce’s April 2021 GTC talk starting around the 15min mark: Inside NVC++ and NVFORTRAN - Bryce Adelstein Lelbach - GTC 2021 - YouTube

#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
#include <nv/target>
#include <cstdio>
struct A
{
template <typename T>
void operator()(T& x)
{
x++;
if target(nv::target::is_device) {
printf("%s\n",__PRETTY_FUNCTION__);
} else {
std::cout << __PRETTY_FUNCTION__ << '\n';
}
}
};
int main()
{
std::vector<int> v = {5,100,3,6,6,109,64,234,656,25,7,44,6,232,2};
const auto pol = std::execution::par_unseq;
std::for_each(pol, v.begin(), v.end(), A{});
std::cout << v[0] << '\n'; // 6
return 0;
}
1 Like