This error is typically caused by low-level memory allocation failure on the Jetson due to resource constraints or memory fragmentation. This then causes the crash in the PyTorch memory manager (CUDACachingAllocator). You could either optimize host memory or consider follow the steps Convert the PyTorch model to use TensorRT which is specifically designed to run on the Jetson with maximum efficiency and minimal memory overhead.
This issue is related to the r36.4.7 update and has a failure rate.
You might see either ‘cudaMalloc failed: out of memory’ or ’ unable to allocate CUDA0 buffer’.
The underlying error for both cases are (in Ollama log):
NvMapMemAllocInternalTagged: 1075072515 error 12
Our internal team is working on the issue.
Will update more information with you later.
Thank you all for the testing and sharing.
We are really sorry about the inconvenience that the r36.4.7 brings.
Although our internal team is still working on the issue, here are some updates about the issue that we can share with you:
The recent update (r38.2.1->r38.2.2, r36.4.4->r36.4.7, 35.6.2->r35.6.3) contains a security fix for CVE-2025-33182 & CVE-2025-33177:
…
The security fix adds a mechanism to prevent the allocation from going into the OOM path (to prevent a denial of service attack).
This led to some limitations in the allocable memory.
We are discussing how to minimize the impact of this security fix.
Will keep you all updated on the latest status.