drm.ko missing on Ubuntu 14.04.1 LTS, AWS EC2 g2.2xlarge instance

As CUDA 6.5 is officially released, I started a fresh AWS EC2 g2.2xlarge instance, installed Ubuntu 14.04.1 LTS.

I fully upgraded the OS. I installed the 4 packages Ubuntu wanted to withdraw from upgrade (linux-virtual, linux-kernel-virtual, etc.)

Then I installed cuda using the official .deb. Now again, whatever I do (nvidia-modprobe, nvidia-smi), I get the error message

modprobe: ERROR: could not insert ‘nvidia_340’: Unknown symbol in module, or unknown parameter (see dmesg)

I thus checked dmesg, I found the cause was that drm.ko was missing. I googled on web but I don’t find any solution. CUDA 6.0 works well with Ubuntu 12.04 on AWS EC2 because the OS was able to launch both the nvidia and drm kernel modules.

I’m not even sure whether I should ask Ubuntu, AWS, or Nvidia for help.

There is no any previous nvidia-* e.g. nvidia-331 on Ubuntu. I checked before installing CUDA 6.5

My suggestion:

Don’t install using the .deb

Do a clean OS load again, and install using the runfile installer.

@txbob I see your point now.

It seems the runfile wants to access the kernel source, he gave me

The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the ‘–kernel-source-path’ flag.

I did sudo apt-get source linux-image-uname -r and it downloaded, unpacked the source into home/ubuntu/Downloads/linux-3.13.0

I ran again sudo ./cuda_6.5.14_linux_64.run --kernel-source-path=/home/ubuntu/Downloads/linux-3.13.0

It gave me the error above. Could you help me @txbob?

select kernel source, or kernel development, as one of the things you want to do when installing ubuntu. The kernel source packages have to be “installed and set up correctly” not just unpacked into a folder.

Alternatively, follow a proper method to install the kernel sources on ubuntu, like this:

http://www.cyberciti.biz/faq/installing-full-kernel-source-ubuntu-linux/

http://askubuntu.com/questions/466590/where-is-the-installed-kernel-source-located

@txbob, I checked out ubuntu-trusty repository, compiled all the flavours, but it seems that the kernel.h under the generated linux-headers-xxxx-generic was wrongly “ln”-ed to a missing kernel.h so the .run file cannot accept it.

Then I found if you apt-get install linux-headers, the files under /usr/src are acceptable for the .run file. I used that to run it.

It seems the compilation went OK. However during the installation it seems it first shuts down AppArmor then tried to invoke drm.ko again. I posted the question on Stack Overflow now

http://stackoverflow.com/questions/25463952/drm-ko-missing-for-cuda-6-5-ubuntu-14-04-aws-ec2-gpu-instance-g2-2xlarge

Actually it seems it doesn’t matter if you do .deb or .run, the thing is that drm invocation always fails. If you find a way to successfully install it, could you let me know? Thanks @txbob

Yes, the correct method to get the kernel sources for driver compilation is

apt-get install linux-source

which I pointed out already, (or apt-get install linux-headers works also)

You may have a conflict with nouveau driver, which has its own drm.ko kernel module.

Have you explicitly removed nouveau from the system?

If not, the instructions provided by “floppy” here:

http://askubuntu.com/questions/451221/ubuntu-14-04-install-nvidia-driver

look pretty good to me. You may also want to add:

sudo apt-get --purge remove xserver-xorg-video-nouveau

as discussed here:

https://help.ubuntu.com/community/BinaryDriverHowto/Nvidia

@txbob it was a hell of experience but I solved it.

Right after the fresh launch of instance, ‘apt-get upgrade’ wanted to keep back 4 kernel packages as linux-image-virtual etc. I still installed them so that I got strictly nothing more to upgrade.

The problem is linux-image-virtual is a lean build without drm.ko. I did apt-get install linux-image-extra-virtual and installed CUDA with .deb (I reckon .deb and .run were similar, so did a test.)

Everything works like a charm now. :)

A fresh launch of GPU instance on AWS EC2 with linux-virtual, linux-image-virtual has no nouveau or previous nvidia drivers.

I also managed in a similar way :
Spin fresh spot instance g2.2xlarge with ubuntu 14.04 64 bits

Then download cuda deb repo for Ubuntu 14.04 with wget ;

At this point the issue remains.
Let’s compile a fresh kernel with built-in drm :

  • sudo apt-get build-dep linux-image-$(uname -r)
  • apt-get source linux-image-uname -r
  • cd linux-3.13.0
  • chmod a+x debian/scripts/*
  • chmod a+x debian/scripts/misc/*
  • fakeroot debian/rules clean
  • fakeroot debian/rules editconfigs

Edit the right conf for your architecture (default amd64 flavor)
Build drm.ko in kernel rather than as a module
In Devices > Graphics Support > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)

Then build kernel:
fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic

If the build is successful, a set of three .deb binary package files will be produced in the directory above the build root directory :

cd …
ls *.deb
linux-headers-…_all.deb
linux-headers-…_amd64.deb
linux-image-…_amd64.deb

sudo dpkg -i linux*.deb
sudo reboot
apt-get -f install to deal with linux-cloud-tools missing dep

Verify it works :
sudo nvidia-smi
should display card and no running processes.