Installing Morpheus prerequisite on AWS

Hi Guys,

I’ve recently joined the Morpheus EA. First todo is to install it on an AWS EC2 instance, but I’ve run into some issues. Can you advise?

I’m following instructions on the “Morpheus_Developer_Kit_on_AWS_0.1-062121-2.pdf”

There, we’re instructed to install the EGX Stack for AWS by following these instructions:

I go along until the section “Validate the state of the GPU Operator”, except for “Adding additional node to EGX Stack”.

According to the installation guide, the expected output for the terminal command:
kubectl get pods --all-namespaces | grep -v kube-system

is

NAMESPACE NAME READY STATUS RESTARTS AGE

default gpu-operator-1590097431-node-feature-discovery-master-76578jwwt 1/1 Running 0 5m2s
default gpu-operator-1590097431-node-feature-discovery-worker-pv5nf 1/1 Running 0 5m2s
default gpu-operator-74c97448d9-n75g8 1/1 Running 1 5m2s
gpu-operator-resources nvidia-container-toolkit-daemonset-pwhfr 1/1 Running 0 4m58s
gpu-operator-resources nvidia-dcgm-exporter-bdzrz 1/1 Running 0 4m57s
gpu-operator-resources nvidia-device-plugin-daemonset-zmjhn 1/1 Running 0 4m57s
gpu-operator-resources nvidia-device-plugin-validation 0/1 Completed 0 4m57s
gpu-operator-resources nvidia-driver-daemonset-7b66v 1/1 Running 0 4m57s

… But I get a different output:

NAMESPACE NAME READY STATUS RESTARTS AGE
default gpu-operator-1637173708-node-feature-discovery-master-78bdv66dz 1/1 Running 0 2m53s
default gpu-operator-1637173708-node-feature-discovery-worker-mjq2g 1/1 Running 0 2m53s
default gpu-operator-76fb8d5c55-g62x9 1/1 Running 0 2m53s
gpu-operator-resources nvidia-container-toolkit-daemonset-d854w 0/1 Init:0/1 0 2m20s
gpu-operator-resources nvidia-driver-daemonset-qrmp7

i.e. the “nvidia-container-toolkit-daemonset” never gets the init stage, and I don’t see the “nvidia-dcgm-exporter” or any of the two “nvidia-device-plugin”

None of the validations listed in the install guide work past this point.

Can you advise?

Thanks

Is this a GPU-enabled instance? Which one?

I was using a g4dn.4xlarge.

BTW, I tried again, from a fresh similar instance, but with a newer version of the install guide (AWS_Ubuntu_Server_v4.1). There I got past that validation point. I think one difference is a containerd step in the EGX stack installation. But… the morpheus manual points to the 3.1 so not sure if the newer stack is compatible with it.