Failing to allocate large arrays with pinned memory

Hi,

when I allocate large vectors in different threads with pinned memory I get an error: “new: cudaHostAlloc returns error code 201
Segmentation fault”
When I compile the same code without pinned memory it runs fine. I guess the vectors are somehow to large to use pinned memory. So is there a way to circumvent this limitation or manually decide whether to use pinned memory? I would like to use pinned memory since it is faster. I add a short example with which this can be reproduced.

#include <iostream>
#include <thread>
#include <vector>

std::size_t N = 1000000000;
int Nthreads = 8;

void f(std::size_t N, int threadid)
{
    std::vector<float> v(N);

    float* p = v.data();
    
    int queue = threadid+1;

    for( auto& x : v )
        x = 0.13;
    #pragma acc enter data pcreate(p[0:N]) async(queue)

    #pragma acc update device(p[0:N]) async(queue)

    #pragma acc exit data delete(p) async(queue)

    std::cout <<"thread: "<<threadid<<"\n";
}

int main()
{
    std::vector<std::thread> pool;

    for(int i = 0; i < Nthreads; ++i)
        pool.push_back(std::thread(f,N,i));

    for(auto& t : pool)
        t.join();
}

Compiled with: “21.9/compilers/bin/nvc++ -ta=tesla,pinned test.cpp”

Hi Rob,

A very odd and specific error. It appears to me that the error only occurs when using std::vector with sizes 112786497 or larger from within a std::thread. Allocating “p” with malloc works as does calling the routine directly rather than within a thread.

Now pinned memory isn’t guaranteed to work since it depends on the OS being able to allocate enough physical memory, but I don’t think that’s the problem here. Error 201 is an invalid context so I’d expect a different error if this was the case. Checking the runtime debug output, the thread’s context id looks ok.

I thought it might have to do with our pinned memory pool allocator, but the error still occurs with this disabled (i.e. NVCOMPILER_ACC_POOL_ALLOC=0).

Given the failure only occurs at larger sizes, it could be some type of overflow, but the values are well within the range of an int, and the memory is small enough not worry about 64-bit indexing.

All that to say, I have no idea what’s wrong here. Hence I’m passing this off to our compiler engineers (filed as TPR #30928) to investigate.

-Mat

1 Like

Apologies for the late post but this error should be fixed as of our 23.3 release:

% nvc++ -acc -fast test.cpp -Mkeepasm -gpu=pinned -V23.3 ; a.out
thread: 1
thread: 0

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.