Ollama and Jetson issue

ashleyatheart · January 11, 2024, 11:42pm

I have a Orin NX 16gb I want to use for edge AI type applications and while I have gotten another LLM to run on it. (can’t remember which it was more work than I would have liked just to get a basic llama 2 7b going) I was tempted by ollama and that they had a workaround for Jetson on their github. Long story short the workaround doesn’t work everything just runs on the CPU… I posted the error and logs I was getting and they posted a fix for it but while I can point it at a stub for the libnvidia-ml.so it was wanting now I am getting the following errors from the ollama service. I am totally new to CUDA development but the general gist I was getting from my searches on similar topics was that libnvidia-ml.so is not the correct library for querying Jetson devices. I think the ollama team wants it to work on Jetsons but before I submit an issue I would like to give them more useful information than nope still not working. I believe their project is written in GO which is a language I have never looked at but I am sure they are using similar driver libraries as everyone else…

bunny@bunnybot:~$ LD_LIBRARY_PATH=/usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs ollama serve
2024/01/11 14:50:16 images.go:808: total blobs: 7
2024/01/11 14:50:16 images.go:815: total unused blobs removed: 0
2024/01/11 14:50:16 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
2024/01/11 14:50:17 shim_ext_server.go:142: Dynamic LLM variants [cuda]
2024/01/11 14:50:17 gpu.go:88: Detecting GPU type
2024/01/11 14:50:17 gpu.go:203: Searching for GPU management library libnvidia-ml.so
2024/01/11 14:50:17 gpu.go:248: Discovered GPU libraries: [/usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so]

!!!
WARNING:

You should always run with libnvidia-ml.so that is installed with your
NVIDIA Display Driver. By default it’s installed in /usr/lib and /usr/lib64.
libnvidia-ml.so in GDK package is a stub library that is attached only for
build purposes (e.g. machine that you build your application doesn’t have
to have Display Driver installed).
!!!
lsof: WARNING: can’t stat() fuse.gvfsd-fuse file system /run/user/124/gvfs
Output information may be incomplete.
lsof: WARNING: can’t stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
Linked to libnvidia-ml library at wrong path : /usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so

2024/01/11 14:50:17 gpu.go:259: Unable to load CUDA management library /usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so: nvml vram init failure: 9
2024/01/11 14:50:17 gpu.go:203: Searching for GPU management library librocm_smi64.so
2024/01/11 14:50:17 gpu.go:248: Discovered GPU libraries:
2024/01/11 14:50:17 routes.go:953: no GPU detected

AastaLLL · January 12, 2024, 3:53am

Hi,

There is libnvidia-ml.so file under the below path.
Please check if it is what you want.

/usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so

Thanks.

dusty_nv · January 12, 2024, 4:07pm

@ashleyatheart have you happen to have seen these yet?

github.com

jmorganca/ollama/blob/main/docs/tutorials/nvidia-jetson.md

# Running Ollama on NVIDIA Jetson Devices

With some minor configuration, Ollama runs well on [NVIDIA Jetson Devices](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/). The following has been tested on [JetPack 5.1.2](https://developer.nvidia.com/embedded/jetpack).

NVIDIA Jetson devices are Linux-based embedded AI computers that are purpose-built for AI applications.

Jetsons have an integrated GPU that is wired directly to the memory controller of the machine. For this reason, the `nvidia-smi` command is unrecognized, and Ollama proceeds to operate in "CPU only"
mode. This can be verified by using a monitoring tool like jtop.

In order to address this, we simply pass the path to the Jetson's pre-installed CUDA libraries into `ollama serve` (while in a tmux session). We then hardcode the num_gpu parameters into a cloned
version of our target model.

Prerequisites:

- curl
- tmux

Here are the steps:

- Install Ollama via standard Linux command (ignore the 404 error): `curl https://ollama.ai/install.sh | sh`

This file has been truncated. show original

I’m not familiar with ollama (or Go), so you would probably want to contact them for further support, but my understanding is that it’s a wrapper around llama.cpp, and I do have llama.cpp working (with GPU acceleration) on Orin:

llama.cpp also has a Python API and OpenAI-compatible server that are pretty easy to use. Also there are more tutorials at https://www.jetson-ai-lab.com/ including oobabooga text-generation-webui (which also can run an OpenAI server endpoint)

ashleyatheart · January 12, 2024, 9:49pm

I didn’t have much hope as it was a stub, but I tried anyway that path and got the error in the logs I gave above. In my limited understanding that library is used to enumerate Nvidia graphics card GPUs and not the Jetson arm soc GPU, Not sure why there needs to be a different library for enumeration for the same vendor but that’s life.

ashleyatheart · January 12, 2024, 10:24pm

I had seen the jetson “fix” on the ollama repo and tried it in both 5.1.2 and 6.0 to no avail it just never uses the Jetson GPU… I posted an issue on their github they tweaked it to have a more robust GPU search but all that did was make it easier to get it to throw that error i posted using their jetson fix. I posted here just to see if i could get an answer as to what might be the jetson gpu library that is equivalent to libnvidia-ml.so. So that way I could point them to a API doc for it so they can just add it to the GPU search.

I think I have used llama.cpp and yes I got really good performance from it. But I rebuilt my jetson and ollama (I rebuilt because I went down a driver version rabbit hole trying to get some random person’s Voice → LLM → TTS project working and made a mess of the OS) so ollama just seemed faster and kept popping up in my AI feed so I had to try and I don’t like just giving up on something. I might have my gripes about Jetson but as someone who has tried various computer vision, speech recognition, etc… projects since computers were 8-bit. These are exciting times and while I think all the huge cloud stuff is amazing my goal is local and portable and in that regard Jetson is hard to beat. One of my outstanding tasks RN is to see if any of the multimodal models can run on the Orin NX to use for visual context extraction. It would be nice to use my Oak-D to alert upon seeing or hearing an “important” thing and to take a snapshot and to pass that frame for context awareness and potential triggers to be passed along similar to how those multi AI agent tools work. Nothing too fancy. Just enough to be amusing for the family and inspire my kids.

dusty_nv · January 13, 2024, 12:02am

Historically it was because nvidia-smi is for discrete GPUs via PCIe/NVLINK/ect whereas Jetson uses an integrated GPU architecture, and there are a lot of different options between the two that makes compatability of that low-level system utility difficult. However, nvidia-smi is actually available in JetPack 6.0 and will report the GPU name as ‘Orin’.

The cudaGetDeviceProperties() API can also be used to enumerate the GPU (and this also is available through cuda-python), and there are other ways on Jetson such as cat /proc/device-tree/model

Oh cool! - yea, see here:

MrDelusionalAi · January 29, 2024, 4:57pm

@ashleyatheart any luck?

I am the same, I find the Jetson device really useful for running services locally with my own data, Ollama is really user friendly wrapper, and the addition of the Ollama webui that’s got a familiar interface and RAG and other features, it’s easy enough to get it running with docker but as you mentioned uses the CPU rather than GPU, used the API, which is based on OpenAi API, to create a discord bot etc but I’m yet to understand how I can use Dusty’s great repo and break it down to the core components to mix Ollama into it so it uses the GPU.

dusty_nv · January 29, 2024, 9:58pm

Hi @MrDelusionalAi, assuming that you want to use the llama.cpp backend for oolama, you could use my llama.cpp container as your base image, and build ollama on top of that (while applying any patches needed/ect).

Alternatively if you don’t want to use containers, you could just extract my llama.cpp wheels (they reside inside that container under /opt) and install them outside container - they were built with CUDA enabled. The reason that I use docker for this stuff are the complex dependencies, applying patches, and keeping by build environment sane so it’s reproducible - but if just have one project in particular that you wish to support, it may not be strictly necessary.

MrDelusionalAi · January 30, 2024, 7:20am

Amazing thanks Dusty! Thanks for the information, I totally agree docker is fantastic, I will try use the llama.cpp container and run Ollama/ Ollama Webui ontop of it.

MrDelusionalAi · February 2, 2024, 8:48pm

Looking more into it, I’m running Jetpack 6 which does output when I use nvidia-smi, diving into Ollama Dockerfile there is a bit I identified that I could alter to make it work/find the jetson GPU?

FROM --platform=linux/arm64 nvidia/cuda:$CUDA_VERSION-devel-rockylinux8 AS cuda-build-arm64
ARG CMAKE_VERSION
COPY •/scripts/rh_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_ linux_ deps.sh
ENV PATH /opt/rh/gcc-toolset-10/root/usr/bin: $PATH
COPY --from=1lm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
ARG CGO_CFLAGS
RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_ linux.sh

For example

ARG CUDA_VERSION=12.2
ARG CMAKE_VERSION=3.22.1
ARG GOLANG_VERSION=1.21.3

Copy the minimal context we need to run the generate scripts

FROM scratch AS llm-code
COPY .git .git
COPY .gitmodules .gitmodules
COPY llm llm

Use the updated L4T base image for ARM64 architecture compatible with Jetson

FROM nvcr.io/nvidia/l4t-base:r36.2.0 AS cuda-build-arm64
ARG CMAKE_VERSION
COPY ./scripts/arm_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} sh /arm_linux_deps.sh
ENV PATH /usr/local/cuda/bin:$PATH
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
ARG CGO_CFLAGS
RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh

FROM nvcr.io/nvidia/l4t-base:r36.2.0 AS cpu-build-arm64
ARG CMAKE_VERSION
ARG GOLANG_VERSION
COPY ./scripts/arm_linux_deps.sh /
RUN CMAKE_VERSION=${CMAKE_VERSION} GOLANG_VERSION=${GOLANG_VERSION} sh /arm_linux_deps.sh
ENV PATH /usr/local/cuda/bin:$PATH
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
ARG OLLAMA_CUSTOM_CPU_DEFS
ARG CGO_CFLAGS
RUN OLLAMA_CPU_TARGET=“cpu” sh gen_linux.sh

Intermediate stage used for ./scripts/build_linux.sh for ARM64

FROM cpu-build-arm64 AS build-arm64
ENV CGO_ENABLED 1
ARG GOLANG_VERSION
WORKDIR /go/src/github.com/jmorganca/ollama
COPY . .
COPY --from=cuda-build-arm64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
ARG GOFLAGS
ARG CGO_CFLAGS
RUN go build .

Runtime stage for ARM64

FROM nvcr.io/nvidia/l4t-base:r36.2.0 as runtime-arm64
RUN apt-get update && apt-get install -y ca-certificates
COPY --from=build-arm64 /go/src/github.com/jmorganca/ollama/ollama /bin/ollama
EXPOSE 11434
ENV OLLAMA_HOST 0.0.0.0
ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64

Since `nvidia-smi` is not available on Jetson, the NVIDIA_DRIVER_CAPABILITIES might need adjustment or removal, I am sure with Jetpack 6 ‘nvidia-smi’ is available

ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENTRYPOINT [“/bin/ollama”]
CMD [“serve”]

calvin.vette · February 24, 2024, 2:06am

I was able to finally get the GPU to kick on JetPack 6.0DP on Orin AGX by using waTeim’s fix (gpu_info_cuda/[ch]) (ollama/gpu at main · waTeim/ollama · GitHub) and by kludging the “GetGPUInfo” function and nulling out the false error returned in the call to C.cuda_check_vram:

    slog.Info("MemInfo.Err\t: %s", C.GoString(memInfo.err))
                // nuke this so we can pass the test below - TODO fix this
                memInfo.err = nil;
                if memInfo.err != nil {
                        slog.Info(fmt.Sprintf("error looking up CUDA GPU memory: %s", 
                              C.GoString(memInfo.err)))
                        C.free(unsafe.Pointer(memInfo.err))
                }

In the cuda_check_vram I hard-wired the values:

  resp->total = 99999;
  resp->free = 99999;
  resp->count = 1;

In gpu/gpu.go, I had to modify the list of files to find the libnvidia-ml.so.1 in the aarch64 target directory:

var CudaLinuxGlobs = []string{
        "/usr/local/cuda/lib64/libnvidia-ml.so*",
        "/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so*",
        "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so*",
        "/usr/lib/wsl/lib/libnvidia-ml.so*",
        "/usr/lib/wsl/drivers/*/libnvidia-ml.so*",
        "/opt/cuda/lib64/libnvidia-ml.so*",
        "/usr/lib*/libnvidia-ml.so*",
        "/usr/local/lib*/libnvidia-ml.so*",
        "/usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so*",
        "/usr/lib/aarch64-linux-gnu/libnvidia-ml.so*",
        // Orin AGX on Jetpack 6.0 DP
        "/usr/lib/aarch64-linux-gnu/nvidia/libnvidia-ml.so*",

This is just a short-term hack - waTeim’s fix hasn’t been merged yet into the ollama main, and I need to come up with a proper fix.

wang.liang · March 20, 2024, 2:28pm

Hi, did you run ollama for GPU successfully?
And how to apply these modifications (or build) ?
Thank you!

system · April 3, 2024, 2:28pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	12607	August 28, 2024
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	25079	May 10, 2024
Ollama on Docker does not finmd GPU Jetson Orin Nano generative_ai	4	1533	March 5, 2025
Ollama is running slow on Jetson AGX Orin Dev-kit (32G) Jetson AGX Orin generative_ai	2	1189	February 29, 2024
Ollama unable to detect gpu on JetPack 6.1 Jetson AGX Orin generative_ai	7	873	October 15, 2024
Ollama run Gives: Error-GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !"CUDA error" Jetson AGX Orin cuda	3	2287	May 15, 2024
Running ai docker containers on jetson orin nano with gpu support Jetson Orin Nano docker , generative_ai	8	367	June 18, 2025
Ollama Docker in Jetson AGX Orin Jetson AGX Orin docker , generative_ai	2	460	November 26, 2024
Ollama 0.4.2 released and runs on Nvidia Jetson Orin AGX 64 Jetson AGX Orin generative_ai , llama	9	1675	November 21, 2024
Ollama timing out when attempting to use GPU instead of CPU Jetson AGX Orin cuda , jetson-inference , generative_ai	9	5395	August 27, 2024