Failing to allocate large arrays with pinned memory

rob_v8 · November 12, 2021, 9:05am

Hi,

when I allocate large vectors in different threads with pinned memory I get an error: “new: cudaHostAlloc returns error code 201
Segmentation fault”
When I compile the same code without pinned memory it runs fine. I guess the vectors are somehow to large to use pinned memory. So is there a way to circumvent this limitation or manually decide whether to use pinned memory? I would like to use pinned memory since it is faster. I add a short example with which this can be reproduced.

#include <iostream>
#include <thread>
#include <vector>

std::size_t N = 1000000000;
int Nthreads = 8;

void f(std::size_t N, int threadid)
{
    std::vector<float> v(N);

    float* p = v.data();
    
    int queue = threadid+1;

    for( auto& x : v )
        x = 0.13;
    #pragma acc enter data pcreate(p[0:N]) async(queue)

    #pragma acc update device(p[0:N]) async(queue)

    #pragma acc exit data delete(p) async(queue)

    std::cout <<"thread: "<<threadid<<"\n";
}

int main()
{
    std::vector<std::thread> pool;

    for(int i = 0; i < Nthreads; ++i)
        pool.push_back(std::thread(f,N,i));

    for(auto& t : pool)
        t.join();
}

Compiled with: “21.9/compilers/bin/nvc++ -ta=tesla,pinned test.cpp”

MatColgrove · November 12, 2021, 10:34pm

Hi Rob,

A very odd and specific error. It appears to me that the error only occurs when using std::vector with sizes 112786497 or larger from within a std::thread. Allocating “p” with malloc works as does calling the routine directly rather than within a thread.

Now pinned memory isn’t guaranteed to work since it depends on the OS being able to allocate enough physical memory, but I don’t think that’s the problem here. Error 201 is an invalid context so I’d expect a different error if this was the case. Checking the runtime debug output, the thread’s context id looks ok.

I thought it might have to do with our pinned memory pool allocator, but the error still occurs with this disabled (i.e. NVCOMPILER_ACC_POOL_ALLOC=0).

Given the failure only occurs at larger sizes, it could be some type of overflow, but the values are well within the range of an int, and the memory is small enough not worry about 64-bit indexing.

All that to say, I have no idea what’s wrong here. Hence I’m passing this off to our compiler engineers (filed as TPR #30928) to investigate.

-Mat

MatColgrove · February 5, 2024, 11:43pm

Apologies for the late post but this error should be fixed as of our 23.3 release:

% nvc++ -acc -fast test.cpp -Mkeepasm -gpu=pinned -V23.3 ; a.out
thread: 1
thread: 0

system · February 19, 2024, 11:44pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cudaMallocHost with large memory failed with invalid argument CUDA NVCC Compiler	0	53	October 24, 2024
mlock versus cudaHostAlloc CUDA Programming and Performance	4	1172	September 20, 2019
Error(Segmentation fault) while using cudaHostAlloc ,Does parameter size require size? CUDA Programming and Performance	1	560	May 31, 2019
Out Of Memory Error Allocating large chunks (> 1GB) of pinned-memory fails CUDA Programming and Performance	3	5829	June 4, 2011
Problem CudaMallocHost CUDA Programming and Performance	4	2055	July 14, 2015
Allocating device memory for an struc inside an std::vector<struct> CUDA Programming and Performance	2	42	September 28, 2024
check for cudaHostAlloc Portable possibility CUDA Programming and Performance	13	2767	July 1, 2015
Unknown Error when allocating unified memory CUDA Setup and Installation	0	723	February 14, 2018
Pinned memory and std::vector CUDA Programming and Performance	2	3724	September 11, 2009
cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit CUDA Programming and Performance	7	7460	June 28, 2011

Failing to allocate large arrays with pinned memory

Related topics