Hello NVIDIA Community,
I am reaching out for assistance with an issue I’m experiencing in activating the Heterogeneous Memory Management (HMM) feature on my system. Below are the specifics of my setup:
- Operating System: Debian Bookworm, running Kernel version 6.6.13+bpo-amd64 from the backports.
- GPU and Driver Details (from
nvidia-smi -q
):
- Driver Version: 550.54.14
- CUDA Version: 12.4
- GPU Model: NVIDIA RTX A4500 (Ampere architecture)
Despite meeting what I believe are the necessary conditions for HMM activation (recent Linux Kernel, updated NVIDIA driver, and the latest CUDA Toolkit), the HMM feature does not seem to be enabled. Here’s a snippet from my NVSMI log for clarity:
==============NVSMI LOG==============
Timestamp : Wed Mar 6 11:55:48 2024
Driver Version : 550.54.14
CUDA Version : 12.4
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA RTX A4500
Product Brand : NVIDIA RTX
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
Given this information, I would like to know if there are any reasons why my GPU might not support HMM, or if there are additional system checks I could perform. Additionally, is there a way to force the activation of this feature, or any specific configurations I might be missing?
Any guidance or insights from the community would be greatly appreciated. Thank you for your time and help!
1 Like
I’m also on Debian Bookworm with Kernel 6.9.7+bpo-amd64 from bookworm-backports, driver 555.42.06 (with nvidia-kernel-open-dkms
) from the Nvidia repo, running a RTX 4080 and I also get Addressing Mode : None
. Could this be related to features of the motherboard/chipset/CPU/etc?
GeForce and Quadro gpus need a particular enablement step, possibly, see note at end.
Here’s what I did, starting with an Asus Z87-PRO motherboard and GeForce GTX 1660 Super GPU (Turing).
- Fresh install of Ubuntu 24.04 desktop, selecting the default/basic installation, and selecting to add the proprietary drivers (NVIDIA GPU, codecs). When this was done it had a R535 driver installed. The installed kernel was
6.8.0-39-generic
- Do the following:
echo "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" | sudo tee /etc/modprobe.d/nvidia-gsp.conf
for the “unsupported” GeForce/Quadro option.
- Install the CUDA toolkit 12.5.1
- Install the open kernel module driver from the above link. I ended up with 555.42.06
- Reboot
- After that,
nvidia-smi -a
indicated HMM addressing mode:
$ nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Sat Jul 27 12:13:32 2024
Driver Version : 555.42.06
CUDA Version : 12.5
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce GTX 1660 SUPER
Product Brand : GeForce
Product Architecture : Turing
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Addressing Mode : HMM
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-cddf7e4c-74b1-30a2-5445-ecf715827977
Minor Number : 0
VBIOS Version : 90.16.48.00.45
MultiGPU Board : No
Board ID : 0x100
Board Part Number : N/A
GPU Part Number : 21C4-300-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU C2C Mode : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
vGPU Heterogeneous Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
GSP Firmware Version : 555.42.06
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Base Classcode : 0x3
Sub Classcode : 0x0
Device Id : 0x21C410DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x40201458
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Device Current : 1
Device Max : 3
Host Max : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : 0 %
Performance State : P8
Clocks Event Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
Sparse Operation Mode : N/A
FB Memory Usage
Total : 6144 MiB
Reserved : 394 MiB
Used : 109 MiB
Free : 5643 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 4 MiB
Free : 252 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 2 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 40 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 91 C
GPU Target Temperature : 83 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
GPU Power Readings
Power Draw : 12.68 W
Current Power Limit : 140.00 W
Requested Power Limit : 140.00 W
Default Power Limit : 140.00 W
Min Power Limit : 70.00 W
Max Power Limit : 176.00 W
GPU Memory Power Readings
Power Draw : N/A
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health
Bandwidth : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 1923
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 37 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2156
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 60 MiB
Capabilities
EGM : disabled
$
After all this, I deleted the /etc/modprobe.d/nvidia-gsp.conf file that got created, and rebooted, and nvidia-smi
still showed addressing mode as HMM. So not sure what the important difference may be between my setup and others.
This blog may be of interest.
1 Like
I was looking in the wrong places for something like echo "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" | sudo tee /etc/modprobe.d/nvidia-gsp.conf
, first of all that blog post and the CUDA 12.2 release notes. So thank you for pointing out where to find this detail.
Sadly it still doesn’t work for me. I switched back and forth between legacy and open as described here and even did a complete re-installation of Nvidia-related packages and many reboots without any luck. Ideally I would like to test it with a completely clean slate/fresh Debian installation, but that has to wait. So if someone else has either positive or negative results on Debian Bookworm I would love to hear about them.
@Robert_Crovella The 12.6 Release notes say that that modprobe.d
file was not needed anymore since driver version 545, so no wonder it didn’t help me and didn’t affect HMM working for you.
I still don’t get HMM addressing with driver version 560.28.03 installed via the new meta package nvidia-open-560
.