Kube proxy in iptables mode fails

We are deploying Kubernetes clusters on Clara AGX (JetPack 5.1) as part of our edge-cloud architecture federated with Karmada. On worker nodes, Kube proxy in iptables mode fails. Kubeproxy can only write partial rules. It can not implement all the rules required by the Kubernetes to enable communication between pods. As a result when trying to install Karmada agent, it can not communicate to the API server of the local cluster hence local cluster can not join remote Karmada control plane. Installing iptables/xtables packages did not help. It seems the kernel does not provide the required netfilter extension.

Someone familiar with Linux kernel configuration on Clara (L4T kernels), especially netfilter/xtables modules (xt_nfacct, nf_conntrack, nf_nat, etc.) might be able to help us.

Cluster setup:
ubuntu@clara-old:~$ uname -r
5.10.65-tegra

ubuntu@clara-old:~$ kubectl version
Client Version: v1.31.13
Kustomize Version: v5.4.2
The connection to the server localhost:8080 was refused - did you specify the right host or port?

ubuntu@clara-old:~$ kubelet --version
Kubernetes v1.31.13

ubuntu@clara-old:~$ containerd --version
containerd containerd.io 1.7.27 05044ec0a9a75232cad458027ca83437aae3f4da

ubuntu@clara-old:~$ ctr version
Client:
Version: 1.7.27
Revision: 05044ec0a9a75232cad458027ca83437aae3f4da
Go version: go1.23.7

ctr: failed to dial ā€œ/run/containerd/containerd.sockā€: connection error: desc = ā€œtransport: error while dialing: dial unix /run/containerd/containerd.sock: connect: permission deniedā€

Kubeproxy has written iptables rules partially.

I appreciate your cordial support. Thanks!

Hi,

Do you need iptables-nft to enable the proxy for your use case?

Please note that iptables-nft is only supported in the rel-38 branch.
We have tried to enable it in the other branch (ex., rel-36) with some extra kernel configuration but no luck.
Please find more details in the topic below:

A workaround to enable Kubernetes without iptables-nft is to servicelb.
Please check more details for the WAR below:

Thanks.

1 Like

ubuntu@clara-old:~$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
[sudo] password for ubuntu:
INFO[0000] Auto-detected mode as ā€œnvmlā€
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0
INFO[0000] Selecting /dev/dri/card0 as /dev/dri/card0
WARN[0000] Could not locate /dev/dri/controlD64: pattern /dev/dri/controlD64 not found
INFO[0000] Selecting /dev/dri/renderD128 as /dev/dri/renderD128
WARN[0000] Failed to evaluate symlink /dev/dri/by-path/pci-0000:09:00.0-card; ignoring
WARN[0000] Failed to evaluate symlink /dev/dri/by-path/pci-0000:09:00.0-render; ignoring
INFO[0000] Using driver version 570.133.07
INFO[0000] Selecting /dev/nvidia-modeset as /dev/nvidia-modeset
INFO[0000] Selecting /dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools
INFO[0000] Selecting /dev/nvidia-uvm as /dev/nvidia-uvm
INFO[0000] Selecting /dev/nvidiactl as /dev/nvidiactl
WARN[0000] Could not locate libnvidia-egl-gbm.so: 64-bit library libnvidia-egl-gbm.so not found
INFO[0000] Selecting /usr/share/glvnd/egl_vendor.d/10_nvidia.json as /usr/share/glvnd/egl_vendor.d/10_nvidia.json
INFO[0000] Selecting /usr/share/vulkan/icd.d/nvidia_icd.json as /usr/share/vulkan/icd.d/nvidia_icd.json
INFO[0000] Selecting /usr/share/vulkan/implicit_layer.d/nvidia_layers.json as /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
INFO[0000] Selecting /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json as /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
INFO[0000] Selecting /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json as /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libEGL_nvidia.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libEGL_nvidia.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libGLESv1_CM_nvidia.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libGLESv1_CM_nvidia.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libGLESv2_nvidia.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libGLESv2_nvidia.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libcuda.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libcuda.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libcudadebugger.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libcudadebugger.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvcuvid.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvcuvid.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-allocator.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-allocator.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-cfg.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-cfg.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-eglcore.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-eglcore.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-encode.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-encode.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-fbc.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-fbc.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-glcore.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-glcore.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-glsi.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-glsi.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-glvkspirv.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-glvkspirv.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-gpucomp.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-gpucomp.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-ml.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-ml.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-ngx.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-ngx.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-nvvm.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-nvvm.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-opencl.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-opencl.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-opticalflow.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-opticalflow.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-ptxjitcompiler.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-ptxjitcompiler.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-rtcore.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-rtcore.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvidia-tls.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvidia-tls.so.570.133.07
INFO[0000] Selecting /usr/lib/aarch64-linux-gnu/libnvoptix.so.570.133.07 as /usr/lib/aarch64-linux-gnu/libnvoptix.so.570.133.07
WARN[0000] Could not locate /nvidia-persistenced/socket: pattern /nvidia-persistenced/socket not found
WARN[0000] Could not locate /nvidia-fabricmanager/socket: pattern /nvidia-fabricmanager/socket not found
WARN[0000] Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found
INFO[0000] Selecting /lib/firmware/nvidia/570.133.07/gsp_ga10x.bin as /lib/firmware/nvidia/570.133.07/gsp_ga10x.bin
INFO[0000] Selecting /lib/firmware/nvidia/570.133.07/gsp_tu10x.bin as /lib/firmware/nvidia/570.133.07/gsp_tu10x.bin
INFO[0000] Selecting /usr/bin/nvidia-smi as /usr/bin/nvidia-smi
INFO[0000] Selecting /usr/bin/nvidia-debugdump as /usr/bin/nvidia-debugdump
INFO[0000] Selecting /usr/bin/nvidia-persistenced as /usr/bin/nvidia-persistenced
INFO[0000] Selecting /usr/bin/nvidia-cuda-mps-control as /usr/bin/nvidia-cuda-mps-control
INFO[0000] Selecting /usr/bin/nvidia-cuda-mps-server as /usr/bin/nvidia-cuda-mps-server
INFO[0000] Generated CDI spec with version 0.5.0
ubuntu@clara-old:~$ sudo nvidia-ctk runtime configure --runtime=docker --cdi.enabled=true
Incorrect Usage: flag provided but not defined: -cdi.enabled

NAME:
NVIDIA Container Toolkit CLI runtime configure - Add a runtime to the specified container engine

USAGE:
NVIDIA Container Toolkit CLI runtime configure [command options] [arguments…]

OPTIONS:
–dry-run update the runtime configuration as required but don’t write changes to disk (default: false)
–runtime value the target runtime engine. One of [crio, docker] (default: ā€œdockerā€)
–config value path to the config file for the target runtime
–nvidia-runtime-name value specify the name of the NVIDIA runtime that will be added (default: ā€œnvidiaā€)
–runtime-path value specify the path to the NVIDIA runtime executable (default: ā€œnvidia-container-runtimeā€)
–set-as-default set the specified runtime as the default runtime (default: false)
–help, -h show help (default: false)

ERRO[0000] flag provided but not defined: -cdi.enabled
ubuntu@clara-old:~$ sudo systemctl restart docker^C
ubuntu@clara-old:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ubuntu@clara-old:~$ nvidia-ctk cdi list INFO[0000] Found 4 CDI devices nvidia.com/gpu=0 nvidia.com/gpu=all nvidia.com/pva=0 nvidia.com/pva=all
No help topic for ā€˜list’
ubuntu@clara-old:~$ nvidia-ctk cdi list
No help topic for ā€˜list’
ubuntu@clara-old:~$ nvidia-ctk cdi
NAME:
NVIDIA Container Toolkit CLI cdi - Provide tools for interacting with Container Device Interface specifications

USAGE:
NVIDIA Container Toolkit CLI cdi command [command options] [arguments…]

COMMANDS:
generate Generate CDI specifications for use with CDI-enabled runtimes
transform Apply a transform to a CDI specification
help, h Shows a list of commands or help for one command

OPTIONS:
–help, -h show help (default: false)

Hi,

We test the above command on the JetPack 6.2.1 environment, and it can work as expected.

$ nvidia-ctk cdi list
INFO[0000] Found 4 CDI devices                          
nvidia.com/gpu=0
nvidia.com/gpu=all
nvidia.com/pva=0
nvidia.com/pva=all

In our environment, the CDI version is 0.8.0.
Could you try to upgrade your environment and try it again?

Thanks.