Running large onnx model is getting killed automatically due to insufficient memory?

indiabrainchip · March 21, 2025, 11:07am

Hello,

I am using Jetson Orin nano with 36.4.3 firmware, Jetpack 6.2, and an SD card of size 512GB.

I am trying to run a large onnx model (say LLM) but it is getting automatically killed. It appears swap memory issue.

free -h
total used free shared buff/cache available
Mem: 7.4Gi 792Mi 6.4Gi 2.0Mi 251Mi 6.5Gi
Swap: 3.7Gi 1.0Gi 2.7Gi

2025-03-21 16:18:59.239897845 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 81 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-21 16:18:59.253917028 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-21 16:18:59.253963430 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Killed

AastaLLL · March 24, 2025, 5:46am

Hi,

Which LLM do you use?
We recommend using the model with weight <4B as Orin Nano has relatively limited memory.

You can find our testing for different LLM models in the below link:

Thanks.

indiabrainchip · March 24, 2025, 6:39am

Hello @AastaLLL,

I am currently working on NLP model (of size 2GB) with 546M parameters.
Total Parameters: 546135121
Total Size: 2083.34 MB

AastaLLL · March 26, 2025, 5:27am

Hi,

Could you share more information about which model you are based on?
And how do you infer it? Any quantization is applied.

For example, we need to use 4-bit group quantization (q4f16_1) to load and run a Gemma 2B model on Orin Nano.

github.com/mlc-ai/mlc-llm

python/mlc_llm/quantization/quantization.py

main


      
              name="q4f16_0",
              kind="group-quant",
              group_size=32,
              quantize_dtype="int4",
              storage_dtype="uint32",
              model_dtype="float16",
              linear_weight_layout="KN",
              quantize_embedding=True,
              quantize_final_fc=True,
          ),
          "q4f16_1": GroupQuantize(
              name="q4f16_1",
              kind="group-quant",
              group_size=32,
              quantize_dtype="int4",
              storage_dtype="uint32",
              model_dtype="float16",
              linear_weight_layout="NK",
              quantize_embedding=True,
              quantize_final_fc=True,
          ),

More, could you also try to infer the model outside of the container to see if this is a docker-related issue.

Thanks.

indiabrainchip · March 27, 2025, 10:41am

Hi @AastaLLL, The issue is with the swap file size. I tried increasing it to 4GB, and it worked.

For anyone who is facing a similar issue, check out the Link to create/update the swap file.

AastaLLL · March 31, 2025, 8:45am

Hi,

Good to know it works now.
Thanks for sharing the fix.

Thanks.

system · April 22, 2025, 12:48pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Orin Nano Super insufficient GPU memory Jetson Orin Nano cudnn	20	1002	April 29, 2025
Unable to load large models on Jetson Orin Nano Super despite sufficient RAM Jetson Orin Nano llm	6	459	October 28, 2025
Title: Whisper (Medium) Model Getting “Killed” on Jetson Orin Nano Jetson Orin Nano cuda	2	327	July 16, 2025
VILA 1.5-3b Model Jetson Orin Nano generative_ai	4	284	June 26, 2025
CUDA out of memory Jetson Orin Nano cuda	6	484	November 6, 2025
Can't start NanoVLM on Orin Nano 8GB Jetson Orin Nano jetson-inference , generative_ai	2	209	January 13, 2025
Jetson orin nano deploying yolo with llm Jetson Orin Nano llm	3	54	February 9, 2026
Jeston Nano 2GB Out of Memory With ONNX->TensorRT Conversion Jetson Nano tensorrt , nano2gb	2	1036	October 15, 2021
Jetson Orin Nano 8gb Crashing Running NanoVLM Live Streaming Jetson Orin Nano generative_ai	4	236	January 6, 2025
Updating Orin Nano breaks Ollama Jetson Orin Nano cuda , generative_ai	26	1260	December 11, 2025

Running large onnx model is getting killed automatically due to insufficient memory?

Related topics