drm.ko missing on Ubuntu 14.04.1 LTS, AWS EC2 g2.2xlarge instance

jli05 · August 21, 2014, 9:16pm

As CUDA 6.5 is officially released, I started a fresh AWS EC2 g2.2xlarge instance, installed Ubuntu 14.04.1 LTS.

I fully upgraded the OS. I installed the 4 packages Ubuntu wanted to withdraw from upgrade (linux-virtual, linux-kernel-virtual, etc.)

Then I installed cuda using the official .deb. Now again, whatever I do (nvidia-modprobe, nvidia-smi), I get the error message

modprobe: ERROR: could not insert ‘nvidia_340’: Unknown symbol in module, or unknown parameter (see dmesg)

I thus checked dmesg, I found the cause was that drm.ko was missing. I googled on web but I don’t find any solution. CUDA 6.0 works well with Ubuntu 12.04 on AWS EC2 because the OS was able to launch both the nvidia and drm kernel modules.

I’m not even sure whether I should ask Ubuntu, AWS, or Nvidia for help.

jli05 · August 21, 2014, 9:17pm

There is no any previous nvidia-* e.g. nvidia-331 on Ubuntu. I checked before installing CUDA 6.5

Robert_Crovella · August 21, 2014, 9:51pm

My suggestion:

Don’t install using the .deb

Do a clean OS load again, and install using the runfile installer.

jli05 · August 21, 2014, 10:55pm

@txbob I see your point now.

It seems the runfile wants to access the kernel source, he gave me

The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the ‘–kernel-source-path’ flag.

I did sudo apt-get source linux-image-uname -r and it downloaded, unpacked the source into home/ubuntu/Downloads/linux-3.13.0

I ran again sudo ./cuda_6.5.14_linux_64.run --kernel-source-path=/home/ubuntu/Downloads/linux-3.13.0

It gave me the error above. Could you help me @txbob?

Robert_Crovella · August 21, 2014, 11:10pm

select kernel source, or kernel development, as one of the things you want to do when installing ubuntu. The kernel source packages have to be “installed and set up correctly” not just unpacked into a folder.

Alternatively, follow a proper method to install the kernel sources on ubuntu, like this:

[url]http://www.cyberciti.biz/faq/installing-full-kernel-source-ubuntu-linux/[/url]

[url]Where is the installed kernel source located? - Ask Ubuntu

jli05 · August 23, 2014, 4:32pm

@txbob, I checked out ubuntu-trusty repository, compiled all the flavours, but it seems that the kernel.h under the generated linux-headers-xxxx-generic was wrongly “ln”-ed to a missing kernel.h so the .run file cannot accept it.

Then I found if you apt-get install linux-headers, the files under /usr/src are acceptable for the .run file. I used that to run it.

It seems the compilation went OK. However during the installation it seems it first shuts down AppArmor then tried to invoke drm.ko again. I posted the question on Stack Overflow now

http://stackoverflow.com/questions/25463952/drm-ko-missing-for-cuda-6-5-ubuntu-14-04-aws-ec2-gpu-instance-g2-2xlarge

Actually it seems it doesn’t matter if you do .deb or .run, the thing is that drm invocation always fails. If you find a way to successfully install it, could you let me know? Thanks @txbob

Robert_Crovella · August 23, 2014, 4:47pm

Yes, the correct method to get the kernel sources for driver compilation is

apt-get install linux-source

which I pointed out already, (or apt-get install linux-headers works also)

You may have a conflict with nouveau driver, which has its own drm.ko kernel module.

Have you explicitly removed nouveau from the system?

If not, the instructions provided by “floppy” here:

[url]How do I install the Nvidia driver for a GeForce GT 630 - Ask Ubuntu

look pretty good to me. You may also want to add:

sudo apt-get --purge remove xserver-xorg-video-nouveau

as discussed here:

[url]https://help.ubuntu.com/community/BinaryDriverHowto/Nvidia[/url]

jli05 · August 23, 2014, 11:08pm

@txbob it was a hell of experience but I solved it.

Right after the fresh launch of instance, ‘apt-get upgrade’ wanted to keep back 4 kernel packages as linux-image-virtual etc. I still installed them so that I got strictly nothing more to upgrade.

The problem is linux-image-virtual is a lean build without drm.ko. I did apt-get install linux-image-extra-virtual and installed CUDA with .deb (I reckon .deb and .run were similar, so did a test.)

Everything works like a charm now. :)

jli05 · August 23, 2014, 11:09pm

A fresh launch of GPU instance on AWS EC2 with linux-virtual, linux-image-virtual has no nouveau or previous nvidia drivers.

lelayf · August 24, 2014, 4:12pm

I also managed in a similar way :
Spin fresh spot instance g2.2xlarge with ubuntu 14.04 64 bits

Then download cuda deb repo for Ubuntu 14.04 with wget ;

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_6.5-14_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_6.5-14_amd64.deb
sudo apt-get update
sudo apt-get install cuda

At this point the issue remains.
Let’s compile a fresh kernel with built-in drm :

sudo apt-get build-dep linux-image-$(uname -r)
apt-get source linux-image-uname -r
cd linux-3.13.0
chmod a+x debian/scripts/*
chmod a+x debian/scripts/misc/*
fakeroot debian/rules clean
fakeroot debian/rules editconfigs

Edit the right conf for your architecture (default amd64 flavor)
Build drm.ko in kernel rather than as a module
In Devices > Graphics Support > [*] Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)

Then build kernel:
fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic

If the build is successful, a set of three .deb binary package files will be produced in the directory above the build root directory :

cd …
ls *.deb
linux-headers-…_all.deb
linux-headers-…_amd64.deb
linux-image-…_amd64.deb

sudo dpkg -i linux*.deb
sudo reboot
apt-get -f install to deal with linux-cloud-tools missing dep

Verify it works :
sudo nvidia-smi
should display card and no running processes.

Topic		Replies	Views
Installing driver for ubuntu 18.04.5 fails with ERROR: Unable to load the kernel module 'nvidia.ko' CUDA Setup and Installation	2	6147	April 6, 2021
Failed to install CUDA 7.5 in ubuntu 14.04 LTS Linux	10	22364	November 3, 2015
Error installing nvidia drivers on x86_64 amazon ec2 gpu cluster (T20 GPU) Linux	14	37162	June 7, 2022
Installing Cuda Drivers on Amazons EC2 CUDA Programming and Performance	8	6183	December 11, 2013
cuda install fail - ubuntu 14.04 CUDA Setup and Installation	8	3716	February 4, 2016
CUDA install fail on Amazon Linux: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver." Linux	6	4535	May 14, 2024
Solved: NVIDIA driver installation fails. CUDA Setup and Installation	34	53447	March 7, 2018
ERROR: Unable to load the kernel module 'nvidia.ko' NVIDIA Virtual GPU Drivers	3	51581	November 4, 2021
Can't install driver 361.62 on CentOS 7.2 CUDA Programming and Performance	4	3004	May 28, 2016
centos 7 1611 Linux 3.10.0-514.26.2.el7.x86_64 Kernel preparation unnecessary for this kernel. Skipping... CUDA Setup and Installation	0	1896	August 17, 2017

drm.ko missing on Ubuntu 14.04.1 LTS, AWS EC2 g2.2xlarge instance

Related topics