And here’s the output I received:
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/29 layers to GPU
Despite specifying num_gpu=-1, none of the layers were offloaded to the GPU. My setup includes CUDA 12.6, and the device is a Jetson Orin with Compute Capability 8.7. Could you help me understand why GPU support is not functioning and provide guidance to resolve this issue?
If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.
Thank you for the suggestions, but my issue seems to be unrelated to general performance settings or the installation of deep learning frameworks.
I am specifically working with the llama-cpp-python package on a Jetson Orin device with CUDA 12.6. Despite specifying num_gpu=-1 in my code, none of the layers are being offloaded to the GPU.
and this is my jetpack version:
jetson_release
Software part of jetson-stats 4.2.12 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.1 [L4T 36.4.0]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
P-Number: p3701-0005
Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform: