Compile llama.cpp to use the Jetson Orin Nano Super GPU

Hi. I’m trying to compile llama.cpp to detect and use the GPU of the Jetson Orin Nano Super. I’ve tried all the instructions and tricks posted online including the instructions on the llama.cpp Github to compile for CUDA. Unfortunately, all my attempts failed. I still get the message that NO GPU IS DETECTED and –ngl 99 is ignored when I launch llama-cli or llama-server. I have also tried it using the Jetson Container for llama.cpp and that failed too.

I would truly appreciate any guidance or pointers to make it work. Thanks!

*** Please note that this reply is generated by LLM automatically ***

I’d be happy to help you troubleshoot the issue with compiling llamacpp to detect and use the GPU of the Jetson Orin Nano.

Firstly, let’s ensure that your Jetson Orin Nano is properly set up and configured. The Jetson Orin Nano is a production module that requires a carrier board designed or procured for your end product. Have you attached the module to a carrier board and flashed it with the software image you have developed?

Assuming that your Jetson Orin Nano is properly set up, let’s move on to the compilation of llamacpp. The instructions on the llamacpp Github provide guidance on compiling for CUDA, but it’s possible that there are some specific requirements or configurations that need to be met for the Jetson Orin Nano.

Here are a few potential issues that might be causing the “NO GPU IS DETECTED” message:

  1. CUDA version: The Jetson Orin Nano uses CUDA 11.4, which might be different from the version used in the llamacpp instructions. Ensure that you are using the correct CUDA version and that it is properly installed on your system.
  2. GPU architecture: The Jetson Orin Nano has a Ampere GPU architecture, which might require specific configurations or flags during compilation. Check the llamacpp documentation to see if there are any specific requirements for Ampere GPUs.
  3. Device detection: The Jetson Orin Nano might not be properly detected by the llamacpp compilation process. You can try using the nvcc compiler to detect the GPU and ensure that it is properly recognized.
  4. Driver version: The Jetson Orin Nano requires a specific driver version to function properly. Ensure that you are using the correct driver version and that it is properly installed on your system.

To troubleshoot the issue, you can try the following:

  1. Check the CUDA version: Run nvcc --version to check the CUDA version installed on your system.
  2. Check the GPU architecture: Run nvidia-smi to check the GPU architecture and ensure that it is properly detected.
  3. Check the device detection: Run nvcc -o detect_gpu detect_gpu.cu to detect the GPU and ensure that it is properly recognized.
  4. Check the driver version: Run nvidia-smi to check the driver version and ensure that it is properly installed.

If none of these troubleshooting steps resolve the issue, you can try using the Jetson Container for llamacpp, which is specifically designed for the Jetson platform. However, since you mentioned that you have already tried this approach and it failed, we might need to dig deeper into the issue.

In comparison to other Jetson devices, the Jetson Orin Nano has a unique GPU architecture and requires specific configurations. The Jetson AGX Orin, for example, has a more powerful GPU and might require different configurations or flags during compilation.

If you provide more details about your compilation process and the errors you are encountering, I can try to provide more specific guidance or pointers to help you resolve the issue.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Unfortunately this info didn’t help and some of it is outdated e.g. my Jetson Orin Nano is using CUDA 12.6

Hi,

Could you try if the below command can work?

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build \
        -DGGML_CUDA=ON \
        -DGGML_CUDA_F16=on \
        -DLLAMA_CURL=on \
        -DGGML_CUDA_FA_ALL_QUANTS=ON  \
        -DCMAKE_CUDA_ARCHITECTURES="87"
cmake --build build --config Release --parallel 8

Thanks.

@AastaLLL Hi. It worked! Thank you so much for your amazing help. I really appreciate it!

I’m so sorry for my earlier inaccurate feedback. I had a clash of nvcc versions which seemed to have caused a cmake crash.

ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes

Once again, thank you so much!

Hi,

Could you try to set the two environment variables below and re-build it again?

export PATH=/usr/local/cuda-12.6/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH

Thanks.

@AastaLLL Hi, sure will do. Thank you so much!