Unable to Utilize GPU for LLM on NVIDIA Jetson AGX Orin

I am trying to run an LLM using CUDA on my NVIDIA Jetson AGX Orin but the model only utilizes the CPU, not the GPU. Below is the relevant portion of my code for loading and using the LLM:

from llama_cpp import Llama

llm = Llama(
model_path=“/path/to/model/Llama3_8B_Ins_Q4_1_gguf”,n_ctx=8192,n_gpu_layers=2,ntcx=2048,use_gpu=True
)

max_tokens = 2048
temperature = 0.4
top_p = 0.2
model_output = llm(prompt, max_tokens=max_tokens, temperature=temperature, top_p=top_p, stream=True)

Issue: Despite setting use_gpu=True, the model does not utilize the GPU and runs entirely on the CPU.
Request :Any guidance on how to ensure the LLM utilizes the GPU would be greatly appreciated.

Hi,

Have you checked if the API you used supports Orin’s GPU?

To run Llama on Orin, it’s recommended to try our tutorial below:

Thanks.

from llama_cpp import Llama

llm = Llama(
model_path=“/path/to/model/Llama3_8B_Ins_Q4_1_gguf”,n_ctx=8192,n_gpu_layers=2,ntcx=2048,use_gpu=True
)

max_tokens = 2048
temperature = 0.4
top_p = 0.2
model_output = llm(prompt, max_tokens=max_tokens, temperature=temperature, top_p=top_p, stream=True)

API is from llama_cpp , which i guess supports Orin’s GPU

Hi,

llama_cpp is a third-party library.
Please check with them to see why the GPU is not working.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.