Jetpack6 llamacpppython

https://pypi.jetson-ai-lab.dev/jp6/cu126/llama-cpp-python/0.3.1

I installed the llama-cpp-python package from this source on my Jetson device. However, it appears that GPU support is not working as expected.

Here’s the code I used:
from llama_cpp import Llama

llama = Llama(“path.gguf”, num_gpu=-1, verbose=True)

And here’s the output I received:
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/29 layers to GPU

Despite specifying num_gpu=-1, none of the layers were offloaded to the GPU. My setup includes CUDA 12.6, and the device is a Jetson Orin with Compute Capability 8.7. Could you help me understand why GPU support is not functioning and provide guidance to resolve this issue?

Thank you in advance for your assistance!

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Thank you for the suggestions, but my issue seems to be unrelated to general performance settings or the installation of deep learning frameworks.

I am specifically working with the llama-cpp-python package on a Jetson Orin device with CUDA 12.6. Despite specifying num_gpu=-1 in my code, none of the layers are being offloaded to the GPU.

and this is my jetpack version:
jetson_release

Software part of jetson-stats 4.2.12 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson AGX Orin Developer Kit - Jetpack 6.1 [L4T 36.4.0]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • P-Number: p3701-0005
  • Module: NVIDIA Jetson AGX Orin (64GB ram)
    Platform:
  • Distribution: Ubuntu 22.04 Jammy Jellyfish
  • Release: 5.15.148-tegra
    jtop:
  • Version: 4.2.12
  • Service: Active
    Libraries:
  • CUDA: 12.6.68
  • cuDNN: 9.3.0.75
  • TensorRT: 10.3.0.30
  • VPI: 3.2.4
  • Vulkan: 1.3.204
  • OpenCV: 4.8.0 - with CUDA: NO

hello…could you please help me?

Hi,

Based on the readme in the llama-cpp-python, please try n_gpu_layers=-1 to use use GPU acceleration:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.