Dcgm-exporter API version mismatch

I am attempting to build dcgm-exporter to run on bare metal on Ubuntu bionic. Nvidia drivers and cuda library are installed from the nvidia ppa. I followed the build instructions in the README for making the binary.

When I attempt to run dcgm-exporter that I built, the following is the output (from testing):

# /usr/bin/dcgm-exporter
INFO[0000] Starting dcgm-exporter
INFO[0000] DCGM successfully initialized!
INFO[0000] Pipeline starting
INFO[0000] Starting webserver
ERRO[0002] Failed to collect metrics with error: Failed to collect metrics with error: Error getting device information: API version mismatch
ERRO[0004] Failed to collect metrics with error: Failed to collect metrics with error: Error getting device information: API version mismatch
ERRO[0006] Failed to collect metrics with error: Failed to collect metrics with error: Error getting device information: API version mismatch

All nvidia tools and scripts used to test gpus work properly. This tool is the only tool giving this error. Any assistance would be greatly appreciated. We have K80 Tesla cards and we’re running nvidia-440.100 driver(s). I’m using the repo master latest build of dcgm-exporter.

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
GCC version:  gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

Unfortunately, when attempting to get a dcgm-exporter version, the output is:

# ~/dcgm-exporter --version
DCGM Exporter version Filled by the build system

We successfully use dcgm-exporter 1.7.2 in kubernetes clusters running on the same hardware. My requirement here is to get the exporter working on bare metal. I can’t find a 1.7.2 version in the repo, however.

I’m at a loss here. Any pointers would be greatly appreciated.

Got this figured out. The problem was that the wrong version of datacenter-gpu-manager deb being installed. The version installed was 2.0.10 (and the version of dcgm-exporter I was trying to use was 2.0). I re-installed datacenter-gpu-managerdowngrading to 1.7.2, which allowed dcgm-exporter to function.