Error with encapsulation of std::vector in a lambda function, std::transform , C++17, NVC++

Hi,

I would like to encapsulate a std::vector named “test” in a lambda function in a std::transform. I use the following main that I compile with nvc++:


#include <iostream>
#include <vector>
#include <algorithm>
#include <execution>

int main(int argc, char** argv){
    size_t nbElement(100000000);
    std::vector<float> tabX, tabY, tabRes;
    tabRes.resize(nbElement);
    for(size_t i(0lu); i < nbElement; ++i){
        tabX.push_back(i*19lu%11);
        tabY.push_back(i*27lu%19);
    }

std::vector<float> test;
test.push_back(3.0);

std::transform(std::execution::par_unseq, std::begin(tabX), std::end(tabX), std::begin(tabY), std::begin(tabRes),
            [test](float xi, float yi){ return test[0]*xi * yi; });

return 0;

}

but I get the following error while compiling :

nvlink error   : Undefined reference to '_ZSt28__throw_bad_array_new_lengthv' in 'CMakeFiles/test_nvcpp.dir/main.cpp.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/tools/nvdd
make[2]: *** [test_nvcpp] Error 2
make[1]: *** [CMakeFiles/test_nvcpp.dir/all] Error 2
make: *** [all] Error 2

And if I pass test by reference “[&test]” I get this error while running :


terminate called after throwing an instance of 'thrust::system::system_error'
  what():  transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

Any idea on how to solve this is welcome !

Hi nnaoy31,

Since we need to rely on CUDA Unified Memory, only allocated (heap) memory can be accessed on the GPU. “test” is a stack variable so can’t be accessed. Though what can be accessed is the data pointers that are contained in the vector. This is why “tabX” and “tabY” work, since “std::begin” is passing in pointers to the data, not the vectors themselves.

Hence to get this code working, you’ll need to pass in a pointer to “test”'s data. Something like:

% cat test.cpp
#include <iostream>
#include <vector>
#include <algorithm>
#include <execution>

int main(int argc, char** argv){
    size_t nbElement(100000000);
    std::vector<float> tabX, tabY, tabRes;
    tabRes.resize(nbElement);
    for(size_t i(0lu); i < nbElement; ++i){
        tabX.push_back(i*19lu%11);
        tabY.push_back(i*27lu%19);
    }

    std::vector<float> test;
    test.push_back(3.0);
    float * test_ptr = test.data();

    std::transform(std::execution::par_unseq, std::begin(tabX), std::end(tabX), std::begin(tabY), std::begin(tabRes),
            [=](float xi, float yi){ return test_ptr[0]*xi * yi; });

return 0;

}
% nvc++ -stdpar=gpu -fast test.cpp; a.out
%

Hope this helps,
Mat

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.