NvMapReserveOp 0x80000000 failed [22] when running cuFFT plans

andrea · February 18, 2021, 10:34pm

I’m porting some CUDA code from desktop to the AGX Xavier and I just started experimenting with unified memory. The code works with small amount of data but when I hit about 12GB used RAM (as shown by tegrastats) I start getting the following error messages during the execution of a cuFFT plan:

NvMapReserveOp 0x80000001 failed [22]
NvMapReserveOp 0x80000000 failed [22]

I haven’t found much about these errors, except this thread from 2019 mentioning “huge allocation,” which is my case as I’m processing volumetric medical images that are 4GB or 8GB big. The thread mentions that the problem was supposedly fixed in JetPack 4.2, but I’m using JetPack 4.4 and still getting a similar error. Any suggestions?

AastaLLL · February 19, 2021, 3:48am

Hi,

May I know the total memory in your device? Standard 16GB or Xavier 32GB?

More, could you share more about the error?
Does it trigger any assertion or incorrect output value?
If yes, would you mind to share a sample to reproduce this issue?

Thanks.

andrea · February 19, 2021, 4:52pm

Hi @AastaLLL, it’s a 32GB Xavier module. I’m attaching a minimal code to reproduce the issue. The attached file works with N_FRAMES up to around ~250, but anything above 300 triggers the issue.

minimal.cu (1.4 KB)

andrea · February 25, 2021, 4:26pm

Hi @AastaLLL, were you able to reproduce the issue? Do you have any suggestions?

AastaLLL · March 2, 2021, 8:40am

Hi,

YES. We can reproduce this issue on our environment.
We are checking this with our internal team. Will get back to you once we got a feedback.

Thanks.

AastaLLL · March 3, 2021, 9:09am

Hi,

Could you use cufftSetAutoAllocation and cufftSetWorkArea to manager helper memory manually?
This can provide a much better error information:
https://docs.nvidia.com/cuda/cufft/index.html#unique_1507062318

Thanks.

andrea · March 3, 2021, 3:55pm

Hi @AastaLLL, I tried your suggestion but it doesn’t seem to provide additional error info. However, I couldn’t find any examples on how to use cufftSetAutoAllocation and cufftSetWorkArea, so I’m not sure I’m doing it in the correct order. I’m attaching my modified code. Can you double-check the relevant portion?

cufftResult fftresult;
cudaError_t cudaError;
cufftHandle plan;
void *workArea;
size_t worksize;

fftresult = cufftCreate(&plan);
fftresult = cufftSetAutoAllocation(plan, 0);
fftresult = cufftMakePlan1d(plan, SIGNAL_SIZE, CUFFT_C2C, SCAN_SIZE, &worksize);
cudaError = cudaMallocManaged(&workArea, worksize);
fftresult = cufftSetWorkArea(plan, workArea);

minimal_debug.cu (1.8 KB)

AastaLLL · March 16, 2021, 11:30am

Hi,

Thanks for your testing.
We will check this internally and update more information with you later.

andrea · March 29, 2021, 2:51pm

Hi @AastaLLL do you have any updates on this?

AastaLLL · March 30, 2021, 7:47am

Hi,

Sorry that this issue is still under checking.
Will get back to you once we got a feedback from the internal team.

Thanks.

andrea · June 2, 2021, 8:41pm

Hi @AastaLLL, this issue is becoming quite urgent for our project. Do you have any updates?

AastaLLL · June 15, 2021, 6:12am

Hi,

Thanks for your patience.

Our internal team is still working on this.
Will share information with you once we get any feedback.

AastaLLL · July 2, 2021, 4:49am

Hi,

Thanks for your patience.

We have rooted caused this problem.
There are some issues in allocating temporary memory for a big chunk on Jetson.
And please noted that such big memory tends to be slower on the Jetson device.

As a result, we provide two possible workarounds for you.

Divide the big chunk into several smaller memory
minimal_divide.cu (2.1 KB)
Use pinned memory instead.
minimal_pinned.cu (1.5 KB)

Thanks.

system · September 12, 2021, 1:19am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.