Running Ganglia Monitor on Cluster of TK1s from NVIDIAs module.

I found this python module created by NVIDIA for Ganglia (a cluster monitoring software).

https://developer.nvidia.com/ganglia-monitoring-system

I followed the directions, but when I load ganglia I get the error…

“Cannot load /usr/lib/ganglia/python_modules/nvidia.py metric module: /usr/lib/ganglia/python_modules/nvidia.py: invalid ELF header”

Could it be that the python module is compiled for x86? I am not sure where to go from here.

This is correct. The ELF header will identify architecture supported by the file.

So it is not just a matter of me recompiling the python file, but NVIDIA needs to build an arm version? I tried to compile this file, and I need libnvidia-ml.so, which, from what I can find, only comes in i386 and AMD64 flavors. Is this correct?

Nvidia-smi will not work since it is for GPUs connected on the PCI-e bus.

The GPUs on TK1 and TX1 are considered “embedded”, you may need to parse some of the /sys/kernel and /sys/device files to find out the load.

Are there any guides or keywords I can look up to help me accomplish this? I am willing to put in the work, but I really do not know where to start.

Thanks all

Haven’t heard of anyone trying this before, and I don’t know if there may be other issues imposed by ARM, but this link has instructions on how to get it running on an older version of Ubuntu: https://sourceforge.net/p/ganglia/mailman/message/32580315/

Nevermind. I didn’t read the question well enough, this doesn’t mention anything about getting the correct ELF version of libnvidia-ml.so