My second new topic for the day!
I had a working deployment of clara on an Ubuntu 18.04 machine with an older Quadro card. Today, the GPU was upgraded to a newer Quadro P6000. Firstly, after bringing the system back up again after the new card was installed, the kubernetes cluster was not working properly and I was forced to do a full clara uninstall.
Upon an apparently successful re-installation, I got up to step 6b of the deployment process, ie
Start the Clara Deploy SDK:
clara platform start
However, after executing the above, I get this:
tho78s@rviz5-cl:~$ sudo clara platform start
rpc error: code = Unknown desc = Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: connect: no route to host
I have subsequently done a full uninstall, reboot, reinstall and get the same issue each time.
I now have two broken deployments of clara on two machines :(
Thanks for your interest in Clara Deploy and sorry for your troubles. Just wanted a quick clarification on your setup; is your machine IP dynamically assigned on the master kubernetes node or is it static and remains fixed?
Hi aquraini, thanks for getting back to me so quickly.
Hmm, I didn’t think of that, I’ll need to get in touch with our network people and find out. I’ll get back to you ASAP.
Kubernetes may leave some residual iptables rules that can cause routing issues on a reinstall. Check the state of the currently running system pods with:
kubectl get pods -n kube-system
You may see the coredns pods in CrashLoopBackOff. If you didn’t explicitly flush iptables during the reinstall, try the following:
sudo systemctl stop kubelet
sudo systemctl stop docker
sudo iptables --flush
sudo iptables -t nat --flush
sudo systemctl start kubelet
sudo systemctl start docker
Your kubernetes cluster should come back up with all system pods running. Once you see these pods up and running, try rerunning clara platform start.
That did the trick, I’m back in business.
Your quick response was much appreciated.