fan speed and noise of DL380 Gen9 with TESLA M60-card

Hello,

I’m in the process of installing 2 new HP DL380 Gen9 servers each equipped with a TESLA M60-card.
I have installed XenServer 7.2 (with all the current patches).
And from the moment I have switched these M60-cards from COMPUTE to GRAPHICS mode the FANS go to 100% of their speed and making a lot of noise.

I have already installed the NVIDIA GRID Manager Version 5.1.
And also the HP SNMP Agents on the XenServer platform, but after each reboot, during the startup-phase of the XenServer Hypervisor at a sudden moment the fans are going to 100% speed again.
In the BIOS the ‘optimal cooling’ feature is set to the default

I have updated the BIOS-firmware to the last version P89 v2.52 (10/25/2017), but still no luck, after each reboot, during the startup of XS the fans are blowing at their maximum speed an making a lot of noise.

Any suggestions of experiences to improve this ?
Is this an HP or XenServer or Nvidia-issue? In other words who to contact to create a support-ticket.

Thanks,
Chris Marreel

Hi… Are you sure the cables are such that they are not obstructing the airflow? What is the wattage of your power supplies? I don’t see anything like this on Dell R730 servers with 1100 W power supplies.
Do you have a lot of other peripherals in there that may be contributing to a lot of extra heat?

Hello,

first of all I would like to know where you bought the M60 boards? Are these directly from HP or are these generic boards? I would assume these are generic boards and HP has a specific vbios on their boards so please contact HP. As long as the board is recognized with nvidia-smi and the GPU is not running on 100% GPU load in idle I don’t see why this should be a Nvidia issue.

Regards

Simon

Hello Tobias and Simon,
This M60-card is the only extra card in this DL380 Gen9 server. And the fans are spinning up to 100% during the startup-phase of the XenServer 7.2 each time at exact the same moment if we test a few reboots. So my conclusion : it has nothing to do with temperature, only with some ‘logic’ that is thinking the fans should go to 100% speed (with the noise as annoying side effect).

In that server there are 2x 1400W power supply’s, both running ‘redundant’ and at the moment only delivering 409W. So the power supply’s are correct for these M60-boards.

The M60-cards are delivered by HP, and the nvidia-smi is recognizing the board, and at the moment I already have my first Win10-station using the M60-card, so everything is running fine, only the fan speed and the generated noise is an issue.

If there are no other thoughts, I will log a ticket at HPE Support for this.

Thanks and greetings,
Chris

Hi

I’ve seen this before a few times on various hardware. If you haven’t done so already, can you make a note of the current BIOS configuration (in case there is anything special configured) and then reset the BIOS back to factory default. Once reset, check all the associated "Power / Performance" and "Cooling" policies for all components, they should all be set to something like "Balanced".

Give that a try and see if it helps, let us know how you get on …

Regards

Ben

Hello,

I have investigated this further.
My setup was working fine, but I have temporary stopped this environment, and placed the XenServer host in Maintenance.
I have removed the NVIDIA Grid Manager (command used: rpm -e NVIDIA-vGPU-xenserver-7.2-384.99.x86_64
), and rebooted the XenServer-host to finalize the de-installation.
During this reboot, I noticed the FAN-speed stays at a normal level.

I checked and this is at the moment still the latest version of the NVIDIA GRID Manager (Version 5.1 of 10 November 2017).
So I re-installed this NVIDIA Grid Manager back onto the XenServer host.

And after the REBOOT after this NVIDIA Grid Manager is installed I noticed that the FAN SPEED went to 100% during the start-up-phase of the Xenserver. It’s somewhere halfway the startup of the XenServer 7.2 OS, so probably at the moment the GRID Manager is ititialising you hear the FAN SPEED of 35-39% increasing to 100%.

So at the moment I think I will need to contact NVIDIA Support for assistance in this issue.
Any input is welcom.

P.S. How can I contact NVIDIA Support ?

Greetings,
Chris MARREEL

Hi cmarreel,

Did you contact HP support? I still don’t see how Nvidia could help here. If this would be a general issue with our vGPU manager I would agree to contact Nvidia but as this seems to be specific to your hardware, the OEM should have a closer look as I can guarantee that your behavior doesn’t occur on other hardware with the same rpm and XS7.2. For example I’m running the same configuration with other hardware without issues.

Regards

Simon

Same here

HP DL 380G9, Tesla M60
vGPU Manager (NVIDIA-vGPU-xenserver-7.2-384.99.x86_64) installed then 100% FAN Speed after Reboot.

Deinstall the vGPU Manager then normal FAN Speed.

It is a Bug within the vGPU Manager

NVIDIA pls help!

Why should this be a bug with vGPU manager? This is HP specific so please file a ticket with HP. We don’t control the hardware fans in any way.
For sure you can also file a ticket with ESP is you think this is Nvidia related…