We made a customized jetson SoM cluster. However, while we are building the Kubernetes demo, we met a problem. We can find GPU on both master and worker nodes while using docker directly. But in K8s pods, only GPU from the master node can be found. Do you know how to fix it?
I think connecting 4 jetson nano to a router and building the K8s cluster on it will reproduce the problem. All hardware are working fine seperately. We think the problem comes from Nvidia’s plugin for Kubernetes.
jet@jetson:~$ sudo kubectl get node
NAME STATUS ROLES AGE VERSION
jetson Ready master 4d5h v1.18.8+k3s1
jetson-qqq Ready worker 4d5h v1.18.8+k3s1
peterjetson1 Ready worker 4d5h v1.18.8+k3s1
qqq-jetson Ready worker 4d5h v1.18.8+k3s1
jet@jetson:~$ sudo kubectl logs devicequery
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 10.2 / 10.0
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3956 MBytes (4148391936 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
And I only got 1 cuda device.
Can you share which plugin did you use? I’ll check if your plugin works.
when I execute command on master node:kubectl run -i -t nvidia --image=jitteam/devicequery to deploy this image, we can see it successfully run on NX(name is xavier, from cuda cores 384, this is run on NX)