Hi all, I picked up a Jetson Orin Nano 8GB Dev kit about 6 months ago and it’s been sitting neglected in the box until this week when I finally decided to integrate it into my Raspberry Pi Cluster.
It wasn’t as straightforward as I thought it would be, and from my research I see there are quite a few other people who have struggled with kubernetes (or k3s, minikube etc) on the Orin Nanos. I’ve been successful in my approach, so I thought I’d share my process.
First here’s a bit of info about the cluster so you can see what I’m working with:
Hardware:
- Control Plane: Raspberry Pi 5 16Gb
- Worker Nodes: 4 x Raspberry Pi 4 8Gb
- NFS Node: 1 x Raspberry Pi 5 8Gb with NVME hat (running 2 x 1Tb SSDs)
- Router: Mikrotik Hex S
- Switch: Digitus Gigabit PoE Switch
- Jetson: Orin Nano 8Gb Dev Kit - 64gb microsd, no NVME yet :(
Software
- k8s Version: Kubernetes v1.34.1
- Gitops: ArgoCD
- CNI: Callico
- Ingress: nginx
- Load balancer: metallb
- Network Proxy: kube-proxy
- Storage: Longhorn
- Certs: cert-manager + Cloudflare.
- Metrics: prometheus + grafana
Essentially… ArgoCD is my kubernetes controller and lets me version all my configuration in a git repo. Callico handles inter-pod communication, while kube-proxy handles service routing (TCP, UDP etc). metallb load balances between pods, cert manager issues certs signed by Cloudflare and ingress is handled by a simple nginx config. Longhorn lets me mount persistent volumes from the NFS on any of the other nodes. Prometheus collects metrics across the cluster and grafana is used to visualise them.
Anyway, onto the relevant stuff, getting the Orin Nano integrated. I’m afraid this is more of a report than a really in depth guide. I won’t be giving you every exact command to run, mainly because I don’t really have the patience to poke about in multiple bash history files. But there’s lots here so hopefully this information will still be useful
Jetson Initial Setup
My Jetson was BNIB, but it shipped with an r36 firmware so I was able to jump straight into Jetpack 6.2 without doing the 5 → 6 upgrade path. Disabling swap is a requirement for kubernetes, on the Jetson you have to disable zram swap. From there I was able to run the join command and saw it appear up on the cluster. CPU, RAM and Disk Pressure was being reported, and I started seeing a few daemonsets popping up on the node! When I launched Grafana, I could already see CPU temps being accurately reported in my dashboards!
Too easy! Or so I thought.
CNI and Calico
Then the problem appeared - my calico pods were failing. It wasn’t the usual issues - permissions, resources, bad volume mapping etc, it was an issue with the host node, not something wrong with the kube config. After much swearing and furious research it seemed to be a problem that other people had run into - the Tegra kernel was missing the modules Calico needs to write iptables. Urgh. I could run a pod via a NodeSelector, but without Calico the node would be severely limited.
I looked at Calico alternatives. Cillium would suffer from the same issue and while Flannel doesn’t need iptables it does far less than Calico and I’d need lots of other services to make up the shortfall. The cluster had been running wonderfully for a long time with Callico so I didn’t really want to migrate away from it.
It would probably be less hassle to recompile the linux kernel, I thought. So I did! And it was!
Enabling kernal modules in Linux_for_Tegra
I am a software engineer but I’m no kernel expert, and I’m certainly no Tegra expert. So this was relatively uncharted waters for me. I have to give it up to gpt-5.2-codex for holding my hand through some of this, particularly in identifying which parts of the kernel I needed to add. So don’t get it twisted, I am not some eldritch wizard. I know some spells, but so do you, and if I can do this you can too.
So the good news is that if you’re running the latest firmware you probably don’t actually have to recompile the kernel, you can just compile the missing kernel modules and then activate them. You don’t even need to restart! The process is as follows:
-
Download the source code for your firmware. If you’re running the latest firmware this is most likely Jetson Linux 36.4.4. You might be running a more recent patch release (I am actually running
36.4.7), but this is the latest source release and good enough for our purposes. -
Unzip it and you should have a
Linux_for_Tegrafolder. I would recommend compiling it on the Jetson, that way you don’t have to account for differing architectures, so copy it over to the jetson and make sure you have the prerequisites installed. Following the developer guide, we still need the sources sync’d so make sure you do that. -
Once the sources have syncd you should be able to cd down into something like
/source/kernel/kernel-jammy-src/- this is where we will make some minor config changes that will tell the compiler to compile the modules we need. -
You need a good config to start. If certain values in your configuration don’t match the running firmware, the modules we compile will be rejected and won’t mount. The best way to handle this I found was to copy the config of the running firmware, using something like
zcat /proc/config.gz > Linux_for_Tegra/source/kernel/kernel-jammy-src/.configWe can then apply a few changes on top of that to minimise any mismatches. -
With that config in place we can toggle some modules. Updating these options will enable the ip_set modules we need for calico:
scripts/config --enable IP_SET scripts/config --module IP_SET_HASH_IP scripts/config --module IP_SET_HASH_NETPORTNET scripts/config --module IP_SET_HASH_NET scripts/config --module NETFILTER_XT_TARGET_CT scripts/config --module NETFILTER_XT_MATCH_RPFILTER scripts/config --module IP_NF_MATCH_RPFILTER scripts/config --module IP6_NF_MATCH_RPFILTER scripts/config --module NETFILTER_NETLINK_LOG -
In addition I had some issues getting the firmware release tag to match up. My firmware wanted the modules to be compiled with the tag `5.15.148-tegra` but mine were coming out under `5.15.148-prod`. I updated the config like so:
scripts/config --disable LOCALVERSION_AUTO scripts/config --set-str LOCALVERSION "-tegra" -
We should now be able to apply our updated config by running
make olddefconfigand you should get a message that the config has been applied. From there we can compile the modules usingmake -j"$(nproc)" modules -
With the modules compiled, we can install them into a folder and then cherry pick the bits we need into our running firmware. Running
sudo make INSTALL_MOD_PATH=/tmp/tegra-mods modules_installwill stick them in the/tmpdirectory. -
That should generate a load of files with the following directory structure:
/tmp/tegra-mods/lib/modules/5.15.148-tegra/kernel/…You can see the release tag we set in step 6 in these file paths. Our modules need to move into/lib/modules/5.15.148-tegra/kernel/…where the running kernel resides. Here’s a list of all the new modules you’ll need:/lib/modules/5.15.148-tegra/kernel/net/netfilter/ipset/ip_set.ko /lib/modules/5.15.148-tegra/kernel/net/netfilter/ipset/ip_set_hash_ip.ko /lib/modules/5.15.148-tegra/kernel/net/netfilter/ipset/ip_set_hash_netportnet.ko /lib/modules/5.15.148-tegra/kernel/net/netfilter/ipset/ip_set_hash_net.ko /lib/modules/5.15.148-tegra/kernel/net/netfilter/xt_CT.ko /lib/modules/5.15.148-tegra/kernel/net/netfilter/xt_NFLOG.ko /lib/modules/5.15.148-tegra/kernel/net/netfilter/nfnetlink_log.ko /lib/modules/5.15.148-tegra/kernel/net/ipv4/netfilter/ipt_rpfilter.ko /lib/modules/5.15.148-tegra/kernel/net/ipv6/netfilter/ip6t_rpfilter.koCopy them from
/tmp/tegra_mods/into/lib/modules/and runsudo depmod -A. -
If everything has worked, the modules will be activated and your calico-pods should no longer fail. Use
modprobeto see if the modules have loaded. Pods can now be scheduled on the Jetson! If it doesn’t work, try swearing a lot and asking your favourite LLM for help. It’s probably a version or release-tag mismatch that is giving you grief. -
To load these changes on boot I created
/etc/modules-load.d/calico-netfilter.confand it just contains a list of all the modules we need. Some of these were already ni the right place, just not activated IIRC:ip_set_hash_ip ip_set_hash_netportnet iptable_raw iptable_filter iptable_mangle iptable_nat x_tables xt_set xt_conntrack xt_comment xt_mark xt_CT xt_NFLOG nfnetlink_log ipt_rpfilter ip6t_rpfilter ip_set_hash_net
NVIDIA Device Plugin
With pods schedulable on the Jetson, the next step was getting the NVIDIA Device plugin running. I deployed this in argo as a daemonset. It also required some changes to the Jetson: increasing inotify limits and setting the default runtime to nvidia in /etc/containerd/config.toml. You also need to make sure the nvidia-container-toolkit is installed and available.
It works!
But it’s not without issue.
Kube sees the Jetson as a single GPU resource available for scheduling so a greedy pod will sit on it indefinitely. The jetson-copilot deploy does this, so any other containers that need a GPU can’t get scheduled unless I scale it back. This is a kube thing rather than a jetson thing, and I’ll be looking to use containers that can handle GPU workloads as jobs rather than evergreen pods.
I also desperately need to add NVME storage. The 64gb microsd isn’t cutting it, and is definitely limiting the work I can do with larger models and samples. One for when the bank balance allows.
Benchmark
Here’s some results from a little test container I got ChatGPT to write. We have some TFLOPS, just not that many.
I’d be very interested in running some more suitable benchmarks created by knowledgeable humans, so if anyone has any suggestions I’d love to hear them!
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:08:11_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
device=Orin
sm=8.7 driver=12060 runtime=12020
global_mem_MB=7620
axpy: repeat 1/3
axpy: repeat 2/3
axpy: repeat 3/3
axpy: N=16777216 iters=400 warmup=10 repeats=3 time_ms=957.397 bandwidth_GBps=84.11
sgemm: repeat 1/3
sgemm: repeat 2/3
sgemm: repeat 3/3
sgemm: M=4096 N=4096 K=4096 iters=30 warmup=5 repeats=3 time_ms=3385.707 TFLOPS=1.22
sgemm2: repeat 1/3
sgemm2: repeat 2/3
sgemm2: repeat 3/3
sgemm2: M=3072 N=3072 K=3072 iters=40 warmup=5 repeats=3 time_ms=1935.897 TFLOPS=1.20
tf32: repeat 1/3
tf32: repeat 2/3
tf32: repeat 3/3
tf32: M=6144 N=6144 K=6144 iters=40 warmup=5 repeats=3 time_ms=4471.029 TFLOPS=4.15
Here’s some red hot pics for your viewing pleasure:
The lack of an NVME drive is very apparent here!
Thermals over 24 hours. The jetson is the yellow line - first spikes are some simple CUDA tests, the second sustained increase is jetson-copilot doing some RAG training on mxbai-embed-large. I’m happy with those CPU temps under load, the GPU temp was stable at about 61° I just haven’t added it to prom/graf yet.
Thanks for reading, let me know any questions or recommendations you might have!


