Also as per Amazon Ec2 docs, the gpu cluster cg1.4x is based on Tesla M 2050 based GPUS but what lspci / lshw seems to report that it is Tesla T20 GPU’s. From what I understand Tesla M class GPUs are based on T20 chip so hopefully I have selected the right drivers.
The version of drivers that I have tried are NVIDIA-Linux-x86_64-319.23.run and NVIDIA-Linux-x86_64-319.17.run and both of them seem to report the same problem which do support Tesla M class GPUs.
I think that you don’t have the drm kernel modules installed on that system. The last part of the log file indicates, that the nvidia module doesn’t find drm_* symbols. Maybe you have to install them first or load them into the runtime via modprobe.
In addition to that, the missing “drm_gem_prime_export” symbols seems to be within the 3.9 kernel only. I don’t know it this symbol is a hard requirement, but maybe you should try an older nvidia driver if the installation of the drm modules doesn’t work or a newer kernel.
Hi, thanks for the reply. Hmm I see … that means I would need to build and install the kernel modules for the kernel installed on the amazon ami isn’t it? I then tried with an older AMI (i.e. for ubuntu 12.04 instead of ubuntu 13.04) and the driver installed just fine. Nevertheless when I get hold again of ubuntu 13.04 AMI, I will try building and installing the kernel modules for drm.
you could look into /lib/modules and search there. A file named “modules.symbols” should have all symbols exported by the modules listed. You could also try to modprobe the drm module(s). Or you could look into the kernel configuration in /proc/config(.gz) and see there if the kernel is configured with drm.
→ Unable to determine if Secure Boot is enabled: No such file or directory
ERROR: Unable to load the kernel module ‘nvidia.ko’. This happens most frequently when this kernel mod
ule was built against the wrong or improperly configured kernel sources, with a version of gcc that dif
fers from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau
is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device
(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver releas
e.
Please see the log entries ‘Kernel module load error’ and ‘Kernel messages’ at the end of the file ‘/va
r/log/nvidia-installer.log’ for more information.
→ Kernel module load error: No such file or directory
→ Kernel messages:
[ 1117.323913] nvidia: Unknown symbol drm_gem_mmap (err 0)
[ 1117.323918] nvidia: Unknown symbol drm_ioctl (err 0)
[ 1117.323928] nvidia: Unknown symbol drm_gem_object_free (err 0)
[ 1117.323942] nvidia: Unknown symbol drm_read (err 0)
[ 1117.323957] nvidia: Unknown symbol drm_gem_handle_create (err 0)
[ 1117.323962] nvidia: Unknown symbol drm_prime_pages_to_sg (err 0)
[ 1117.324002] nvidia: Unknown symbol drm_pci_exit (err 0)
[ 1117.324079] nvidia: Unknown symbol drm_release (err 0)
[ 1117.324084] nvidia: Unknown symbol drm_gem_prime_export (err 0)
[ 1863.597421] mtrr: no MTRR for d0000000,100000 found
[ 3341.270419] nvidia: Unknown symbol drm_open (err 0)
[ 3341.270426] nvidia: Unknown symbol drm_fasync (err 0)
[ 3341.270436] nvidia: Unknown symbol drm_poll (err 0)
[ 3341.270449] nvidia: Unknown symbol drm_pci_init (err 0)
[ 3341.270499] nvidia: Unknown symbol drm_gem_prime_handle_to_fd (err 0)
[ 3341.270517] nvidia: Unknown symbol drm_gem_private_object_init (err 0)
[ 3341.270532] nvidia: Unknown symbol drm_gem_mmap (err 0)
[ 3341.270537] nvidia: Unknown symbol drm_ioctl (err 0)
[ 3341.270546] nvidia: Unknown symbol drm_gem_object_free (err 0)
[ 3341.270559] nvidia: Unknown symbol drm_read (err 0)
[ 3341.270575] nvidia: Unknown symbol drm_gem_handle_create (err 0)
[ 3341.270580] nvidia: Unknown symbol drm_prime_pages_to_sg (err 0)
[ 3341.270619] nvidia: Unknown symbol drm_pci_exit (err 0)
[ 3341.270636] nvidia: Unknown symbol drm_release (err 0)
[ 3341.270639] nvidia: Unknown symbol drm_gem_prime_export (err 0)
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
This issue is resolved now. In case it is helpful to someone else putting my resolution here.
The issue is seen with Ubuntu 13.04 with kernel 3.8.0-19-generic. The issue as that it was unable to find and load the drm.ko module. Somehow I did not even find it installed in /lib/modules/3.8.0-19-generic. So I installed the kernel sources from ubuntu repository and then built the kernel and modules. And then I inserted the drm.ko and tried to build the nvidia drivers and it succeeded.
sudo apt-get source linux-image-3.8.0-19-generic
cd linux-3.8.0
sudo cp /boot/config-3.8.0-19-generic .config
sudo make menuconfig
Select
Device drivers —>
Graphics support —>
Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) —>
That’s all. BTW I trid installing the modules but somehow I still don’t see it in /lib/modules/3.8.0-19-generic/, so not sure if on reboot the nvidia kernel drivers will load or not.
I have found the issue with building of modules but not getting loaded on reboot. Apparently the kernel version that gets built show 3.8.13 instead of 3.8.0-19, so the modules get placed in /lib/modules/3.8.13.2/. So you need to change the kernel version at the top in the Makefile or by some other mechanism. I don’t know of a way to do so apart from this.
I had the same problem today on Ubuntu Server 14.04. I tried your method of compiling the drm module and inserting it but to no avail. This was on kernel version 3.13 so it looks like the bug is still there.
Oh and instead of using make -j16 like you suggest in your post I used make drivers/gpu/drm/ as to not compile the full kernel but just the module. However a drm.ko file was never generated.
So I switched back to 12.04 and installing was not a problem. Works for now!
I had a similar problem trying to install CUDA on an EC2 g2.2xlarge GPU instance with the Ubuntu Server 14.04 LTS (HVM), SSD Volume Type AMI (ami-d05e75b8).
Some characteristics below on the AMI, taken from the first login:
$ lsb_release -a
$ lspci
$ nvidia-smi
AWS Support gave me a quick answer on how to resolve the issue.