Nvidia DGX-1 ; Health Tracking with SNMP or IPMI (through Zabbix)

Hello

I want to track the server’s health (especially the disk/HDD’s status) with Zabbix using SNMP or/and IPMI.
I checked with snmpwalk all the available OID’s (for the disk’s health) but I couldn’t find anything that gives exactly the information about the disk’s health
I used in Zabbix a template that uses ‘ipmi sensor discovery’ and was able to get some info (on the Fans for example) but I cant get the info on the HDD’s
(error: Preprocessing failed for: [{“id”:“Watchdog”,“name”:“(6.2).Watchdog”,“sensor”:{“type”:35,“text”:“watchdog_2”},“reading”:{"ty…

  1. Failed: cannot extract value from json by path “$.[?(@.id==‘HDD9’)].value.first()”: no data matches the specified path)

would appreciate any help

Ester

Hi @user108486 ,

What about using the NVSM API instead? https://docs.nvidia.com/datacenter/nvsm/latest/nvsm-user-guide/index.html#nvsm-api-calling That should give you a much more rich set of information than what the BMC (aka IPMI/SNMP) has visibility into.

ScottE