Hello guys,
I have 2 Tesla M60 (with same VBios, VRAM…) which I will call “card ok” and “card faulty” and 2 identical servers : Supermicro SuperChassis 118G-1400B
I installed the driver version 450 server on Ubuntu 20.04 from the official Nvidia repository on a Supermicro SuperChassis 118G-1400B
The “card ok “ works as expected but the “card faulty” crashes as soon as I use the driver (even launching nvidia-smi makes the server crash). When the server crash, it freezes without an error, without a log.
I tried the “card faulty” on another server using the same OS, the result was the same.
But when I downgrade the driver to the version 390, it works fine and when I use it on Windows with driver version “452.39” it works fine too.
Now I’ve no idea what I can do to investigate further. Please help.
“card ok”:
*-pci:0
description: PCI bridge
product: PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
vendor: PLX Technology, Inc.
physical id: 8
bus info: pci@0000:03:08.0
version: ca
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:26 ioport:c000(size=4096) memory:f9000000-f9ffffff ioport:383fe0000000(size=301989888)
*-display
description: VGA compatible controller
product: GM204GL [Tesla M60]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:04:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:383f0-383ef iomemory:383f0-383ef irq:26 memory:f9000000-f9ffffff memory:383fe0000000-383fefffffff memory:383ff0000000-383ff1ffffff ioport:c000(size=128)
*-pci:1
description: PCI bridge
product: PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
vendor: PLX Technology, Inc.
physical id: 10
bus info: pci@0000:03:10.0
version: ca
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:26 ioport:b000(size=4096) memory:f8000000-f8ffffff ioport:383fc0000000(size=301989888)
*-display
description: VGA compatible controller
product: GM204GL [Tesla M60]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:05:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:383f0-383ef iomemory:383f0-383ef irq:26 memory:f8000000-f8ffffff memory:383fc0000000-383fcfffffff memory:383fd0000000-383fd1ffffff ioport:b000(size=128)
“‘card faulty”:
*-pci:0
description: PCI bridge
product: PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
vendor: PLX Technology, Inc.
physical id: 8
bus info: pci@0000:03:08.0
version: ca
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:26 ioport:c000(size=4096) memory:f9000000-f9ffffff ioport:383fe0000000(size=301989888)
*-display
description: VGA compatible controller
product: GM204GL [Tesla M60]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:04:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:383f0-383ef iomemory:383f0-383ef irq:26 memory:f9000000-f9ffffff memory:383fe0000000-383fefffffff memory:383ff0000000-383ff1ffffff ioport:c000(size=128)
*-pci:1
description: PCI bridge
product: PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
vendor: PLX Technology, Inc.
physical id: 10
bus info: pci@0000:03:10.0
version: ca
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:26 ioport:b000(size=4096) memory:f8000000-f8ffffff ioport:383fc0000000(size=301989888)
*-display
description: VGA compatible controller
product: GM204GL [Tesla M60]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:05:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:383f0-383ef iomemory:383f0-383ef irq:26 memory:f8000000-f8ffffff memory:383fc0000000-383fcfffffff memory:383fd0000000-383fd1ffffff ioport:b000(size=128)