I tried to report issues for nvidia-ml-py
to the e-mail I found on PyPI: nvml-bindings@nvidia.com. But my e-mail was rejected by the e-mail server:
The recipient’s domain has rejected your message because there is no recipient’s e-mail address in the domain’s directory. It may be that the address is misspelled or does not exist.
I try to repost my issues in this forum and wait for an update.
The original post:
Dear maintainers of nvidia-ml-py
:
Firstly, thanks so much for creating and maintaining such a useful package. It allows users to write monitoring tools for NVIDIA GPUs in Python. I found some issues and/or bugs while creating my top-like monitor. I didn’t find a place (like GitHub issues) to report them so I decided to write an e-mail to the address I found on PyPI.
Issues and questions:
-
Is there any open-source or code hosting plan to GitHub (or similar) like NVIDIA/go-nvml. This will greatly facilitate the submission of issues and improve the bindings.
-
Backward compatibility between driver and binding versions.
Since CUDA 11, the definition of
nvmlProcessInfo_t
adds two new fieldsgpuInstanceId
andcomputeInstanceId
./** * Information about running compute processes on the GPU */ typedef struct nvmlProcessInfo_st { unsigned int pid; //!< Process ID unsigned long long usedGpuMemory; //!< Amount of used GPU memory in bytes. //! Under WDDM, \ref NVML_VALUE_NOT_AVAILABLE is always reported //! because Windows KMD manages all the memory and not the NVIDIA driver unsigned int gpuInstanceId; //!< If MIG is enabled, stores a valid GPU instance ID. gpuInstanceId is set to // 0xFFFFFFFF otherwise. unsigned int computeInstanceId; //!< If MIG is enabled, stores a valid compute instance ID. computeInstanceId is set to // 0xFFFFFFFF otherwise. } nvmlProcessInfo_t;
The Python bindings will get wrong results or raise
FunctionNotFound
error with pre-11 drivers (widely used in Ubuntu 16.04 LTS).v1 NVIDIA Driver 430.64 NVIDIA Driver 470.57.02 nvidia-ml-py==11.450.51
works but without CI ID
/GI ID
works but without CI ID
/GI ID
nvidia-ml-py>=11.450.129
no exceptions in Python
but gets wrong results
(subscript out of range in C library)no exceptions in Python
but gets wrong results
(subscript out of range in C library)v2 NVIDIA Driver 430.64 NVIDIA Driver 470.57.02 nvidia-ml-py==11.450.51
function not found no exceptions in Python
but gets wrong results
(subscript out of range in C library)nvidia-ml-py>=11.450.129
function not found works with correct CI ID
/GI ID
Similar issues on NVIDIA/go-nvml: issue NVIDIA/go-nvml#21 and pull request NVIDIA/go-nvml#25.
NVIDIA/go-nvml claims it is designed to be backward compatible:
These bindings are not a reimplementation of NVML in Go, but rather a set of wrappers around the C API provided by
libnvidia-ml.so
. This library is part of the standard NVIDIA driver distribution, and should be available on any Linux system that has the NVIDIA driver installed. The API is designed to be backwards compatible, so the latest bindings should work with any version oflibnvidia-ml.so
installed on your system.NVIDIA/go-nvml looks up for the versioned API (suffixed with
_v1
,_v2
, etc.) on initialization and set the unversioned bindings to the compatible version for the driver on the system. Is there will be similar handling for the Python bindingsnvidia-ml-py
? -
Bug: the bindings should return Python types rather than Ctypes.
The function
nvmlDeviceIsMigDeviceHandle
was added tonvidia-ml-py
since version 11.450.51. It returnsc_uint
rather than a Python typeint
orbool
.def nvmlDeviceIsMigDeviceHandle(device): c_isMigDevice = c_uint() fn = _nvmlGetFunctionPointer("nvmlDeviceIsMigDeviceHandle") ret = fn(device, byref(c_isMigDevice)) _nvmlCheckReturn(ret) return c_isMigDevice
The return statement should be changed to return
c_isMigDevice.value
like other bindings do.def nvmlDeviceIsMigDeviceHandle(device): c_isMigDevice = c_uint() fn = _nvmlGetFunctionPointer("nvmlDeviceIsMigDeviceHandle") ret = fn(device, byref(c_isMigDevice)) _nvmlCheckReturn(ret) - return c_isMigDevice + return c_isMigDevice.value
Waiting for a reply!
Sincerely
Xuehai Pan