Tesla P4 - HP DL385 Gen10 - ESXi 6.5 showing no ram available on Telsa P4

fraser2 · July 6, 2021, 8:55am

Hi there, We have 3 x Tesla P4 cards we are wanting to use on ESXi 6.5 for vSGA use. I installed the first 2 on HP DL380 Gen9 server with Intel CPU’s and installed the NVIDIA-VMware-418.196-1OEM.650.0.0.4598673.x86_64.vib file and everything is working fine with those 2 servers. We are running ESXi V 6.5.0 build 17477841 on all 3 hosts.

The issue I have is with the 3rd host which is HP DL385 with AMD CPU. I’ve tried installing the same driver and it shows up in vCenter. It shows the card, it shows the Active type ‘shared’ and configured type ‘shared’ but shows memory as 0.00mb.

Also if I run nvidia-smi it shows it says no devices were found even though the bios and ESXi show it’s there. So it’s obviously something to do with the drivers not loading etc but I don’t know why.

I’m clearly missing something here.

I’ve also tried updating the driver to the latest version I could download NVIDIA-VMware-460.73.02-1OEM.650.0.0.4598673.x86_64.vib but it hasn’t changed anything.

I’m losing my mind with this one…any help is very much appreciated.

sschaber · July 6, 2021, 9:29am

Can you run nvidia-smi on the ESX host after installing the VIB? What is the output? Seems the GPU is not recognized properly. Maybe you need to check the BIOS settings first.

fraser2 · July 6, 2021, 9:31am

Yes I have run it, as mentioned in my original post. It just says ‘no devices were found’ but clearly it is seen by the bios as it’s shown in iLO, BIOS and in ESXi …

That’s why I’m confused.

sschaber · July 6, 2021, 9:58am

check with dmesg on the host if there are errors, I still assume a wrong BIOS setting

fraser2 · July 6, 2021, 10:04am

There’s a whole load of stuff in there when I run that command…not sure what I should be looking for, but this stuff seems GPU related.

2021-07-06T09:54:40.203Z cpu44:72073)NVRM: GPU at 0000:23:00.0 has software scheduler DISABLED with policy BEST_EFFORT.
2021-07-06T09:54:40.217Z cpu44:72073)NVRM: GPU 0000:23:00.0: RmInitAdapter failed! (0x26:0xffff:1290)

Any idea what sort of settings in the BIOS I should be looking for ?? I checked the IO-SRV in virtualisation section and that’s enabled. Well it’s greyed out so I can’t select it, but it says enabled in grey.

sschaber · July 6, 2021, 10:39am

As you can see the board is not loaded properly. Might be BIOS related or hardware defect. Please check with HPE first for the right BIOS setting. Especially the MMIO settings are relevant. P4 doesn’t require SR-IOV enabled. In addition, I’m not sure if P4 is qualified for the given server at all. As far as I can see only T4 was validated.

fraser2 · July 6, 2021, 10:54am

Yeah I figured out that something like that was going on, but have been through all of the setting in the BIOS and there’s nothing at all that I can see that’s related. I just assumed if they worked fine with Gen9 of the same server that the Gen10 would be fine… I know that’s not always the case, but wouldn’t of expected it to not work like this, but wasn’t sure if I needed to get advice from HP or NVIDIA or VMWare so I’ll see what I can get from HP.

Thanks for the advice.

fraser2 · July 6, 2021, 11:53am

OK so annoyingly HP says it’s not supported on this server which is really dumb… and frustrating. But at least if someone else is looking for this information it’s here now.

Thanks for your help sschaber

fraser2 · July 7, 2021, 7:09am

OK so I have a bit of a wrinkle in this story. Today I removed the card from the HP DL386 Gen10 and put it into the Cisco UCS C210 M2 server we had to see if it worked there and got exactly the same issue…everything looks fine in the sense that the BIOS reports it, VMWare see’s it, but says it’s 0 Mb of RAM.

nvidia-smi reports no devices available.

It seems a bit to coincidental that both servers are doing the same thing… I know I have read that some of the Tesla cards are selectable between modes but I’m not sure if the Tesla P4’s are like this and maybe the card is in the wrong mode ? I haven’t been able to find any information about that with the P4 but just thought I’d ask the question in case I’m missing something there and I’m fighting a losing battle if the card isn’t going to respond properly.

Any advice is appreciated as always. I’ve got some other later model Cisco servers I’m going to give the card and go in to see if I get similar results.

sschaber · July 13, 2021, 5:47am

Have you tried to use ESX 6.7 instead? Why do you still use 6.5? Modeswitch is not possible on the P4 as it handles graphics and compute in parallel.
Did you open a support ticket with our NVES? They could analyze the nvbugreport to see if points to the issue.
And please keep the “old” 418.x driver for your testing as I doubt the latest one is working with 6.5 due to extended VIB size. You even need a current patch for 6.7 to extend the VIB size accepted and VMWare didn’t release a patch for 6.5 as far as I know.

regards
Simon

fraser2 · July 13, 2021, 7:11am

Hi Simon,

I used ESXi 6.5 because I understood that it was the only version that officially supported the VSGA function without some kind of licencing. As I said, we have 2 other servers with the same card, running the same version of driver but with DL380 Gen9, not DL385 Gen10 but hard to know if it’s a card issue or something else going on as I’m not familiar enough with these cards and ESXi to diagnose.

I haven’t done anything else other than post here as I wasn’t aware of any other options sorry.

Thanks for the info on the modeswitch, I did assume it couldn’t, but thought maybe it was a reason the card might not seem to be working properly, but at least if we know it’s not possible then at least I know it’s not that !

I am going to use another server to do some more trialing with different drivers, ESXi etc etc to see if I get any different results. I couldn’t really do too much on our production servers, but now I have some other servers to prove if the card is or isn’t working properly. That’s my first step I guess.

Even when I installed the latest driver I had, ESXi didn’t complain and said it was installed successfully I thought, but I better read it again just to be super sure.

sschaber · July 13, 2021, 8:50am

Unfortunately you are wrong. ESX version is not relevant for licensing. You always need a vPC license for vSGA as soon as you use a GPU like P4.

fraser2 · July 13, 2021, 11:29am

Thanks for letting me know that. The licencing model is super complicated I found when I tried to look up what was needed. Found plenty of posts in other places with very confused I.T staff also, so was obvious it wasn’t just me struggling to understand it.

I found a driver for ESXi 6.5 from before NVIDIA had their vPC licencing so that’s why I assumed it didn’t need any kind of licencing for that version. I know I needed some kind of licencing for the later versions (which I ended up needing to install to get things going on the other servers) so once we get all three going we will get whatever we need to make us legal.

fraser2 · July 27, 2021, 12:01pm

So I haven’t got anywhere with running the card on seperate servers or seperate O/S’s.

I’m just wondering if it’s worth me attempting a bios / firmware flash to make sure it’s not something like that ? or is there some process so we can diagnose further ?

I can find a BIOS but it’s for SUSE or some other Linux, so wondered if there’s some other easier method than going through setting all that up just to flash it again ?

It’s really frustrating…

sschaber · July 27, 2021, 12:31pm

Do you mean flashing the GPU? Doesn’t make any sense as this is never necessary.

regards
Simon

fraser2 · July 27, 2021, 12:43pm

Yes. I mean I know it’s not normally needed, but obviously in this situation it’s not working as expected so I just thought maybe it could be something like that since it shows up in BIOS and in ESXi, but the drivers won’t initialise in ESXi or Windows… so guess I was clutching at straws a bit.

how can I diagnose further what might be happening ?

fraser2 · November 16, 2021, 5:57am

Just wanted to let everyone know that might be having similar issues that this was down to a faulty card.

I went through everything over and over and have had working cards in other servers with no drama’s so I knew what it should be doing.

Finally after exhausting all options I contacted the supplier and got a return / replacement and it worked straight off once it showed so clearly it was some kind of fault, but it showed in BIOS etc and in VMWare partially as mentioned… but now I know it was a faulty card.

Thanks for everyone that replied.

Topic		Replies	Views
ESXi 6.5su3, NVIDIA P4 no PCI shared device NVIDIA Virtual GPU Drivers	0	2093	February 19, 2020
Unable to use Tesla P40 in ESX 7.0u3 in Windows 10 Prod 22H2 or Windows Server 2019 Tesla Boards	7	4718	April 25, 2023
Nvidia-smi doesn't find any devices (Tesla T4, Proxmox/Debian and vgpu) Tesla Boards linux-driver	2	359	January 20, 2025
ESXi 6.7 + Tesla V100 + 430.27 not working NVIDIA Virtual GPU Drivers	8	15050	July 23, 2019
GPU in chassis not being seen by drivers Tesla Boards	5	2250	February 21, 2023
Nvidia-smi "No devices were found" - VMWare ESXI Ubuntu Server 20.04.03 with RTX3070 Linux cuda , ubuntu , driver , nvidia-smi , linux-driver-solutions	50	33464	April 26, 2025
vGPU of Telsa T4 not seen on ESX 6.7 NVIDIA Virtual GPU Drivers	34	14578	June 12, 2020
Tesla P40 in Dell Percision 7910 rack CUDA Programming and Performance	10	2462	February 16, 2024
ESXi 6.5 + Tesla M60 cant start VW General Discussion	4	15651	February 19, 2018
Centos 7.7 Installation Tesla v100 graphics card driver failed Linux	18	1425	October 12, 2021

Tesla P4 - HP DL385 Gen10 - ESXi 6.5 showing no ram available on Telsa P4

Related topics