Unable to view Clara Monitoring Platform metrics in Grafana

Hey guys,

I’m tyring to follow the Clara documentation to view the GPU, CPU, disk metrics by creating dashboards in Grafana for the liver tumor reference pipeline with monitoring (liver-tumor-monitoring.yaml).
But the output is all blank.

Below is the doc I’m using. Please let me know if there any changes to this.
https://docs.nvidia.com/clara/deploy/sdk/infrastructure/metrics/public/docs/overview.html

Thanks

Hi billy19murahmi, thanks for your interest in the Clara Deploy platform.

I apologize for the difficulty with the documentation. We are refactoring the monitoring stack, and that piece of the documentation is a bit out of date. This will be addressed in a future release.

For now, you should be able to view GPU metrics in Grafana using the clara monitoring stack with the following.

First, make sure we have a fresh monitoring stack:

clara monitor stop
clara pull -y monitor
clara monitor start

Then pull and unpack the liver tumor segmentation pipeline as an example, following the setup instructions on NGC:

mkdir =p ~/.clara/pipelines
cd ~/.clara/pipelines
clara pull pipeline clara_ai_livertumor_pipeline
cd clara_ai_livertumor_pipeline
unzip -d input app_livertumor-input_v1.zip
sudo unzip -d /clara/common/models app_livertumor-model_v1.zip

Before creating the pipeline, modify the liver-tumor-monitoring.yaml pipeline to use api-version 0.3.0 and to point to your monitoring server instance.

First, the API version:

sed -i ‘s,0.4.0,0.3.0,g’ liver-tumor-monitoring.yaml

Now find the IP address of your monitoring server:

kubectl get pods -o wide | grep clara-monitor-server-monitor-server | awk ‘{print $6}’
10.244.0.40

This will output the IP address to be used in the liver-tumor-monitoring.yaml pipeline, in my case 10.244.0.40. Replace this with your own IP output in the previous step. This step is critical to ensure metrics are stored in the Elasticsearch database.

sed -i ‘s,REPLACE-IP-ADDRESS,10.244.0.40,g’ liver-tumor-monitoring.yaml

Now we can create the pipeline with the updated api-version and IP:

clara create pipeline -p liver-tumor-monitoring.yaml

This will output a PIPELINE_ID. Use this ID to register the pipeline with the DICOM adapter and create a DICOM source/destination (in this case on localhost 127.0.0.1):

clara dicom create aetitle -a liver-tumor-test -n liver-tumor-test -p <PIPELINE_ID>
clara dicom list aetitle
clara dicom create source -a DCM4CHEE -i 127.0.0.1
clara dicom create dest -n MYPACS -a ORTHANC -i 127.0.0.1 -p 1004
clara dicom list dest

Now trigger the pipeline with the path to the dcm input:

storescu -v +sd +r -xb -aet “DCM4CHEE” \
-aec “liver-test” 127.0.0.1 104 \
~/.clara/pipelines/clara_ai_livertumor_pipeline/input/dcm

After this pipeline completes, the metrics are stored in the Elasticsearch database and can be accessed from the Grafana interface.

Navigate to <host IP>:32000 to access Grafana and login using the usual login. In the left toolbar, click the “+” icon and Create a Dashboard. This will give you a panel where you can select “Add Query”. After clicking “Add Query”, you will see a Query dropdown. Here select “gpumetrics”.

Then in the Query panel below, choose “Average” for the Metric. The “select field” dropdown will show all available GPU metrics. You can use any of these to build out the panel. You can do similar for host metrics by selecting “cpumetrics” in the Query dropdown above.

Let me know if you run into any issues.

-Kris