I am trying to run an LLM using CUDA on my NVIDIA Jetson AGX Orin but the model only utilizes the CPU, not the GPU.
While loading the LLM , i am using llama_cpp and i have specified “n_ctx=8192, n_gpu_layers=2, ntcx=2048,use_gpu= True” while loading the LLM.
Request :Any guidance on how to ensure the LLM utilizes the GPU would be greatly appreciated.
I am looking to utilize the GPU of my Nvidia Jetson AGX Orin to load a Large Language Model (LLM). I’m not limited to using llama_cpp; any other method or library that can achieve this is fine with me.