Llama.cpp loading Llama 3.1 very slow on Jetson Xavier AGX

generative.cloud · September 1, 2024, 6:43pm

Early this year (2024), I was very satisfied with the performance of llama.cpp 2 and Mistral on Jetson Xavier AGX.

However, after I built the last llama.cpp code last week (August 25, 2024) to run Llama 3.1 and Phi 3.5, the loading time became unendurable. Llama-3.1-8B-Lexi-Uncensored_V2_Q8.gguf (8.5GB) took 7.5 minutes to load.

Once loaded, the inference speed is fine:

llama_print_timings:        load time =  450482.81 ms
llama_print_timings:      sample time =    3076.00 ms /  1308 runs   (    2.35 ms per token,   425.23 tokens per second)
llama_print_timings: prompt eval time =   14287.63 ms /   196 tokens (   72.90 ms per token,    13.72 tokens per second)
llama_print_timings:        eval time =  154754.42 ms /  1300 runs   (  119.04 ms per token,     8.40 tokens per second)
llama_print_timings:       total time =  770548.21 ms /  1496 tokens

Anyone had similar experience? Not many people in llama.cpp community use Jetson, I think this may be the proper forum to ask.

Topic		Replies	Views
LLM inference results? Jetson AGX Xavier jetson-inference , generative_ai , llm , llama	3	354	October 27, 2025
Running Ollama / llama3.1 on Jetson AGX Xavier 16gb is it possible? how-to? Jetson AGX Xavier generative_ai , llama-31-8b-instruct	8	2933	October 19, 2024
Jetpack6 llamacpppython Jetson AGX Orin generative_ai , llama	5	1000	January 28, 2025
Running llama3.3 or llama4 on Jetson AGX Orin Developer Kit (64 GB) Jetson AGX Orin generative_ai	8	1137	May 12, 2025
Slow model loading on a Jetson AGX Xavier with TensorFlow 2.5.0 Jetson AGX Xavier cuda , tensorflow	13	2488	November 10, 2021
Single node and Dual node llama.cpp build flag DGX Spark / GB10 llama	5	105	March 11, 2026
Unable to Utilize GPU for LLM on NVIDIA Jetson AGX Orin Jetson AGX Orin generative_ai	4	369	July 4, 2024
Running Llama3.1 on JP5.1 Jetson AGX Orin generative_ai , llama	6	332	January 10, 2025
Failed Llama.cpp inference on AGX Xavier: Need to downgrade L4T from 35.6.3 to 35.6.2 Jetson AGX Xavier llama	4	122	November 18, 2025
Compiling llama.cpp DGX Spark / GB10 llama	14	1330	February 7, 2026

Llama.cpp loading Llama 3.1 very slow on Jetson Xavier AGX

Related topics