I am trying to run an LLM using CUDA on my NVIDIA Jetson AGX Orin but the model only utilizes the CPU, not the GPU. Below is the relevant portion of my code for loading and using the LLM:
Issue: Despite setting use_gpu=True, the model does not utilize the GPU and runs entirely on the CPU. Request :Any guidance on how to ensure the LLM utilizes the GPU would be greatly appreciated.