New NVIDIA Driver Solaris11.3 Install Problems

Successfully installed latest 384.59 NVIDIA driver on a new installation of Solaris 11.3 X86.
Unsuccessfully tried to install latest 384.59 NVIDIA driver over an existing updated driver from NVIDIA.
The original update from NVIDIA seems to leave the system in a state that cannot easily be updated.

The standard install script fails, so I stepped through it and solved problems as I went.
a) had to create a new /var/sadm/pkg/NVDAgraphicsr/install directory.
b) had to install some legacy packages - SUNWxwrtl, SUNWxwplt, SUNWgnome-base-libs, SUNWxorg-clientlibs
c) had to modify pkginfo files in both /var/sadm/pkg/NVDAgraphicsr and /var/sadm/pkg/NVDAgraphics files to add a line “CLASSES=none” (the pkgrm does not work without this)
d) did “pkgrm NVDAgraphicsr NVDAgraphics” and all looked ok (no errors, MUST do NVDAgraphicsr first)
e) did ./install from the top level of unpacked NVIDIA-Solaris-x86-384.56
(alternately did the individual steps from the ./install script
unpacked the package in /root/Downloads/NVDIA_driver/unpacked
cd /root/Downloads/NVIDIA_driver/unpacked/NVIDIA-Solaris-x86-384.59
pkgadd -d . NVDAgraphicsr
pkgadd -d . NVDAgraphics
cp -fp gfx_private/SunOS-5.11/gfx_private /kernel/misc
cp -fp gfx_private/SunOS-5.11/amd64/gfx_private /kernel/misc/amd64
nvidia-xconfig

f) no errors on the install (either way!)
g) when I did “reboot – -r” the system will not boot… it crashes
panic[cpu…

h) no entries in the Xorg.0.log file, some entries in the /var/adm/messages
How can I get this to install properly???
Surely someone in the NVIDIA corp has hit this problem?
Going back to the original Solaris 11.3 install is not a good option for our testing plans.

Steps a through c seem strange to me. I think you can just uninstall the IPS packages with “pkg remove” and then start with step d. The warnings about the missing legacy packages can be ignored since those are provided by IPS packages now.

IIRC, Solaris provides its own gfx_private module. Does the crash go away if you revert back to the stock ones rather than the ones from the .run package?

John Martin provided some instructions in a bug report when these IPS packages first started being integrated: https://web.archive.org/web/20130227001337/http://defect.opensolaris.org/bz/show_bug.cgi?id=12196#c5

Mr Plattner!

 Thank you for your comments!
 Unfortunately, the install problems remain after trying all of your suggestions (and those on the referenced web page, resulting in several days of attempts).

A) Maybe on Linux steps A through C are not of any use, but the install script (i.e. NVIDIA-Solaris-x86-384.59.run) exits as “unsuccessful” when it cannot find the directories:
/var/sadm/pkg/NVDAgraphicsr/install
/var/sadm/pkg/NVDAgraphics/install
So I created them and the script continues to the next problem…
B) The same script exits when it encounters the next problem of finding “CLASSES” in the pkginfo file, e.g.
/var/sadm/pkg/NVDAgraphicsr/pkginfo
/var/sadm/pkg/NVDAgraphics/pkginfo
I edited both files and inserted a line “CLASSES=none” into both files…
C) I followed your suggestion to NOT install the four SUNW* prerequisite packages, and it did not affect the results, so that sounds like a good suggestion??..

The script appears to go to completion, but has various problems when run on different X86 computers…
HPZ440 (NVS 315)- after “reboot – -r”, the system (with NVS315) does a “panic” abort coming up from the reboot…
HPZ230 (NVS 315)- after “reboot – -r”, the system (with NVS315) does a “panic” abort coming up from the reboot…
HPZ420 (Quadro 2000)- after “reboot – -r”, the system does startup ok ,but cannot put any further driver updates in place for some odd reason (i.e. panics after subsequent driver updates).
ASUS/AMD-CPU/NVS315 - after “reboot – -r” , system boots fine and subsequent NVIDIA driver updates work well.

I don't think the driver updates for the NVS315 should behave differently for different motherboards unless Solaris has a "Hardware Abstraction Layer" problem...  This points away from NVIDIA, but I am not sure on this esoteric point (i.e. why difference between HPZ440-NVS315 and ASUS-315 ??? same firmware on the NVS315s in the different motherboards).

I took a look at the gfx_private file and tried your suggestions about copying using the Solaris version, but no progress (there IS a new gfx_private file in the NVIDIA driver package distributed).
I suspect the “nvidia” file is not distributed ok, but did not copy around as you did not suggest it.
I even put new port addresses (e.g. f0 ) from the website references into place, no luck.
Has this NVIDIA driver been tested with Solaris 11.3? If so, what are the “SRU” (System Repository Update) versions so I can match details for another attempt…

One more quandary… Even if NVDAgraphicsr and NVDAgraphics are put in place by the script (as seems to be “successful” on the ASUS system), do the gfx_private and nvidia files need to be updated as well? On the ASUS system, they don’t appear to have been updated by the driver install script in the system locations…

The “pkg” commands seem at odds with expectations on the Solaris system… You cite “pkg remove”, but it must be “pkgrm”… Other semantic issues arise here and there… Installing the package from the NVIDIA driver package does not put the package in the repository, and “pkginfo” commands list out the installed NVDAgraphicsr and NVDAgraphics details.

Now I am looking at “Solaris SRU” versions, motherboard BIOS/ME versions, and other issues, but I suspect the NVIDIA driver has not been tested on Solaris 11.3 with different motherboards and needs some “touch-up”… The Oracle support folks cited this possibility… I am trying to get NVIDIA folks together with the Oracle support folks to resolve this…

Any more suggestions and thoughts and similar experiences would be of great help!

Ok, I got an update from Oracle that indicates the nature of the problems with the NVIDIA drivers and Solaris.
If the motherboard boots with UEFI, then the NVIDIA 384.59 driver causes Solaris 11.3 (no SRU) to Panic crash on boot.
I had one computer (ASUS AMD with the NVS 315) that updated fine with solaris 11.3 and driver 384.59, and it booted with legacy bios boot.
Oracle indicates other issues as well, so NVIDIA should coordinate with Oracle to get the latest techniques to avoid these panic problems… the UEFI boot is not the only issue.

I don’t consider this issue resolved until I see the NVIDIA drivers posted with the UEFI and other fixes in place.

best regards,
roberds

Note -
I asked some questions about the “nvidia” file, gfx_private file, and other details (e.g. address f0) that have not been addressed by anyone yet.
It would be nice if a proper install of the NVIDIA driver could be verified by looking at the following files and their updated location and versions:
NVDAgraphicsr
NVDAgraphics
nvidia
gfx_private
updates to /devices directory
any details to check in /etc/devlink.tab
any details to check for device addresses in other tables/files
Just because the install activity cites “success”, it would be nice to verify the above details…

  thank you

Fact is that NVIDIAs Solaris driver is unusable if you have a UEFI bios. You might as well as stop support of Solaris because no one can upgrade from Solaris 11.3.

All modern motherboards today (Supermicro as we have) use UEFI. Legacy BIOS is… legacy and we have paid for NVIDIA gpus, and we don’t think it is unreasonable that NVIDIA supports the latest Solaris 11.3 distro? If NVIDIA could please look into this we would be happy. Steps to reproduce problem:

  1. Get a UEFI motherboard, such as a Supermicro and a NVIDIA GPU, such as GTX 770.
  2. Install latest Solaris 11.3 onto a harddisk/SSD.
  3. Boot into Solaris 11.3 and try to upgrade the NVIDIA driver.
    This will fail.

If you are not going to fix this problem, could you at least post some pointers here, how to fix this problem? We hope NVIDIA can provide pointers to the problem so the community can take over this issue if NVIDIA does not want to provide a solution.

The latest Solaris 11.3 SRU has a new NVIDIA driver with bug fixes.

I have not checked the NVIDIA drivers for Solaris yet. I hope they do change their release to where they support UEFI.

The latest Solaris 11.3 SRU (for X86) has the driver 384.90 adapted to UEFI use in Solaris per following SR response quote from Oracle support (as of 17Nov2017):
“according to the developer (regardless of what the README says), the following package is delivered in the latest released Solaris 11.3 SRU:
driver/graphics/nvidia@400.384.90.0,5.11-0.175.3.26.0.2.0:3A20171016T234052Z
So it does have 384.90 in it and based on your testing, it should resolve the original issue. Let me know if you have any further questions.”

Well I only have the vanilla Solaris 11.3 that can be downloaded from Oracle website, which does not support UEFI. I am thinking about upgrading my GPU and buy a more powerful for doing some ML research (I work as a mathematician in finance industry), but if I buy a new GPU, then I cannot upgrade the driver, right? I must reinstall Solaris from scratch right? I am not really interested in reinstalling Solaris from scratch. This means this issue stops customers from buying new Nvidia GPUs right?

On 15Nov2017, The SRU (system repository update) that has a new NVIDIA driver for Solaris 11.3 (resolving the UEFI issue) was released.
Per the Oracle Support folks, the latest driver in the SRU is supposed to be:
driver/graphics/nvidia@400.384.90.0,5.11-0.175.3.26.0.2.0:3A20171016T234052Z
I don’t know if you need a subscription to the support or not, to get this SRU… We have a subscription.
The SRU is designated 11.3.26.5.0, with release date of 15Nov2017. It is about 7 Gigabytes in four zipped files.
This a fairly recent driver with many bug fixes, so it might cover the latest NVIDIA cards for Solaris 11.3.
Note that the SRU has some fairly substantial changes to Solaris 11.3, so installing it wholesale may not be a good idea.
The wholesale SRU install caused us to have some problems with our “home-brew” device drivers for custom hardware.

In addition, I have a fairly complex procedure to allow NVIDIA 384.59 to be installed that may work for all recent NVIDIA drivers (the procedure reserves some memory that the driver expects for the driver to use in non-UEFI mode, it also has an Oracle-provided custom 59Gigabyte addition that may not be necessary for your work).
The procedure may work for all NVIDIA drivers for Solaris, but I am not sure (never tested on anything except 384.59).

IF you want the procedure, I may be able to post it here, but don’t know due to its approx 2000+ character size.