NvMapReserveOp 0x80000000 failed [22] when running cuFFT plans

I’m porting some CUDA code from desktop to the AGX Xavier and I just started experimenting with unified memory. The code works with small amount of data but when I hit about 12GB used RAM (as shown by tegrastats) I start getting the following error messages during the execution of a cuFFT plan:

NvMapReserveOp 0x80000001 failed [22]
NvMapReserveOp 0x80000000 failed [22]

I haven’t found much about these errors, except this thread from 2019 mentioning “huge allocation,” which is my case as I’m processing volumetric medical images that are 4GB or 8GB big. The thread mentions that the problem was supposedly fixed in JetPack 4.2, but I’m using JetPack 4.4 and still getting a similar error. Any suggestions?

Hi,

May I know the total memory in your device? Standard 16GB or Xavier 32GB?

More, could you share more about the error?
Does it trigger any assertion or incorrect output value?
If yes, would you mind to share a sample to reproduce this issue?

Thanks.

Hi @AastaLLL, it’s a 32GB Xavier module. I’m attaching a minimal code to reproduce the issue. The attached file works with N_FRAMES up to around ~250, but anything above 300 triggers the issue.

minimal.cu (1.4 KB)

Hi @AastaLLL, were you able to reproduce the issue? Do you have any suggestions?

Hi,

YES. We can reproduce this issue on our environment.
We are checking this with our internal team. Will get back to you once we got a feedback.

Thanks.

Hi,

Could you use cufftSetAutoAllocation and cufftSetWorkArea to manager helper memory manually?
This can provide a much better error information:
https://docs.nvidia.com/cuda/cufft/index.html#unique_1507062318

Thanks.

Hi @AastaLLL, I tried your suggestion but it doesn’t seem to provide additional error info. However, I couldn’t find any examples on how to use cufftSetAutoAllocation and cufftSetWorkArea, so I’m not sure I’m doing it in the correct order. I’m attaching my modified code. Can you double-check the relevant portion?

cufftResult fftresult;
cudaError_t cudaError;
cufftHandle plan;
void *workArea;
size_t worksize;

fftresult = cufftCreate(&plan);
fftresult = cufftSetAutoAllocation(plan, 0);
fftresult = cufftMakePlan1d(plan, SIGNAL_SIZE, CUFFT_C2C, SCAN_SIZE, &worksize);
cudaError = cudaMallocManaged(&workArea, worksize);
fftresult = cufftSetWorkArea(plan, workArea);

minimal_debug.cu (1.8 KB)

Hi,

Thanks for your testing.
We will check this internally and update more information with you later.

Hi @AastaLLL do you have any updates on this?

Hi,

Sorry that this issue is still under checking.
Will get back to you once we got a feedback from the internal team.

Thanks.

Hi @AastaLLL, this issue is becoming quite urgent for our project. Do you have any updates?

Hi,

Thanks for your patience.

Our internal team is still working on this.
Will share information with you once we get any feedback.

Hi,

Thanks for your patience.

We have rooted caused this problem.
There are some issues in allocating temporary memory for a big chunk on Jetson.
And please noted that such big memory tends to be slower on the Jetson device.

As a result, we provide two possible workarounds for you.

  1. Divide the big chunk into several smaller memory
    minimal_divide.cu (2.1 KB)

  2. Use pinned memory instead.
    minimal_pinned.cu (1.5 KB)

Thanks.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.