Unable to start Clara Platform

Try to deploy Clara SDK v 0.5.0-9881 with this documentation
https://docs.nvidia.com/clara/deploy/ClaraInstallation.html

Arch :

  • Linux Ubuntu 18.04
  • 2 GPU Tesla V100 PCI 32GB

I have installed the prerequisite

  • minikube version: v1.9.2
  • kubectl :
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • helm (it does not work with v3)
Client: &version.Version{SemVer:"v2.16.4", GitCommit:"5e135cc465d4231d9bfe2c5a43fd2978ef527e83", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.4", GitCommit:"5e135cc465d4231d9bfe2c5a43fd2978ef527e83", GitTreeState:"clean"}
  • Docker
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:25:46 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 nvidia:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

I have installed Clara CLI

  • clara version 0.5.0-9881

then the configuration :
clara config --key NGC_API_KEY --orgteam nvidia/Clara

with my API KEY. I have used nvidia/Clara as org and team because I can’t see any information about organization and team in my NGC account.

clara pull platform ==> OK
clara platform start
-==> ERROR

rpc error: code = Unknown desc = validation failed: [configmaps "clara-workflow-controller-configmap" not found, configmaps "clara-resultsservice-configmap" not found, persistentvolumes "clara-platformapiserver-common-volume" not found, persistentvolumes "clara-platformapiserver-model-volume" not found, persistentvolumes "clara-platformapiserver-payload-volume" not found, persistentvolumes "clara-platformapiserver-service-volume" not found, persistentvolumes "pv-results-service-volume" not found, persistentvolumeclaims "clara-platformapiserver-common-volume-claim" not found, persistentvolumeclaims "clara-platformapiserver-model-volume-claim" not found, persistentvolumeclaims "clara-platformapiserver-payload-volume-claim" not found, persistentvolumeclaims "clara-platformapiserver-service-volume-claim" not found, persistentvolumeclaims "pv-results-service-volume-claim" not found, serviceaccounts "argo-ui" not found, serviceaccounts "argo" not found, serviceaccounts "platformapiserver-service-account" not found, clusterroles.rbac.authorization.k8s.io "clara-ui-cluster-role" not found, clusterroles.rbac.authorization.k8s.io "clara-workflow-controller-cluster-role" not found, clusterroles.rbac.authorization.k8s.io "clara-platformapiserver-cluster-role" not found, clusterrolebindings.rbac.authorization.k8s.io "clara-ui-crb" not found, clusterrolebindings.rbac.authorization.k8s.io "clara-workflow-controller-binding" not found, clusterrolebindings.rbac.authorization.k8s.io "clara-platformapiserver-binding" not found, services "clara-ui" not found, services "clara" not found, services "clara-resultsservice" not found, unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1", deployments.apps "clara-resultsservice" not found]

Hello Julien,

Thanks for your interest in Clara Deploy and sorry for your troubles! Also thanks for providing the information, that certainly helps. Looks like you have the prerequisite hardware; however, the Kubernetes and Helm versions need to be consistent with the versions explicitly listed in the Systems Requirements section of the link you mention. In addition; for the org/team; please use lower case “C” ie. nvidia/clara. Hope this helps with the installation progress.

Thank you for your quick answer.
Unfortunately, I already used nvidia/clara with lower case.
It is an error in my first post.

I had same issue, but panicing and entering many times clara platform start pulled successfully at some point.

But on different runs it was erroring on diffrent crds.

I also had the same problem some time ago. Thanks for the information.

Hi,
While first starting Clara Platform with cmd “clara platform start” on EC2 p3 8x Instances on AWS, it hung and didn’t respond, and then later after 30 minutes I closed console and reopened to start with the same as follows incl. as in the image attachment, but again failed to start with various errors.

ubuntu@ip-XXX-XX-XX-XXX:~$ clara platform start
rpc error: code = Unknown desc = a release named clara already exists.
Run: helm ls --all clara; to check the status of the release
Or run: helm del --purge clara; to delete it

ubuntu@ip-XXX-XX-XX-XXX:~$ helm ls --all clara
NAME REVISIONUPDATED STATUS CHART
APP VERSIONNAMESPACE
clara1 Fri May 22 18:46:30 2020PENDING_INSTALLclara-0.5.0-2
004.71.0 default

ubuntu@ip-XXX-XX-XX-XXX:~$ clara platform start
rpc error: code = Unknown desc = unable to get CRD: customresourcedefinitions.apiexte
nsions.k8s.ioinferenceservers.clara.nvidia.com” not found

I kindly request you to look into it and provide me with the resolution as soon as possible.

KINDLY PLEASE HELP US AS SOON AS POSSIBLE!
I’M LOOKING FORWARD TO YOUR RESPONSE!

Also check out this link as follows, Clara Deploy SDK does not work on EC2 p3 8x Instance

Hello - could you try uninstalling and reinstalling? The error suggests that something wasn’t installed cleanly. Follow these steps below to uninstall:
https://docs.nvidia.com/clara/deploy/ClaraUninstallation.html#steps-to-uninstall-clara

I think I saw mentioned that you are installing on AWS; be sure the ports mentioned in:
https://docs.nvidia.com/clara/deploy/ClaraInstallation.html#aws-virtual-machine-configuration
… have been opened, prior to reinstalling Deploy…

Hello, Thanks for your response! I’ve already resolved it by myself but now I’m currently stuck at this stage as per the following link Clara Deploy SDK stuck at "Wait until TRTIS is Ready"