Unable to use AWS DL AMI 42.0 in AWS EMR due to NVIDIA error 'nvidia appears to already be loaded in your kernel'

dmitry11 · March 17, 2021, 6:06pm

Description

We’re trying to apply an Amazon Deep Learning AMI to an EMR cluster. We get this error: “ERROR: An NVIDIA kernel module ‘nvidia’ appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module’s usage count, for which the simplest remedy is to reboot your computer.”

We need a clean procedure for how to fix this or work around the error.

Environment

TensorRT Version: Not sure
GPU Type: Whatever is in AWS machines like P3, P4, G3, G4
Nvidia Driver Version: Not sure (Amazon docs don’t include this info)
CUDA Version: Not sure (Amazon docs don’t include this info)
CUDNN Version: Not sure (Amazon docs don’t include this info)
Operating System + Version: Amazon Linux AMI 2018.03
Python Version (if applicable): Python 3.7.3
TensorFlow Version (if applicable): 2.4.1
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): Not sure

Relevant Files

Minimal bug report scenario is described here: GitHub - dgoldenberg-audiomack/nvidia-issue-1

Steps To Reproduce

Minimal bug report scenario is described here: GitHub - dgoldenberg-audiomack/nvidia-issue-1

Includes: steps to set up, steps to run, log file snippet with the error.

NVES · March 17, 2021, 6:07pm

Hi,
Please check the below links, as they might answer your concerns.

Thanks!

dmitry11 · March 17, 2021, 6:19pm

These links don’t seem relevant

spolisetty · March 18, 2021, 2:47pm

Hi @dmitry11,

This forum talks about updates and issues related to TensorRT.
We request you to post your query in relevant platform.

Thank you.

Topic		Replies	Views
An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel TensorRT	5	9419	September 7, 2022
An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel Linux	1	14152	August 6, 2021
An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel NGC GPU Cloud	1	7378	August 10, 2021
TensorRT Installation and Running Error on AWS EC2 Deep Learning AMI Instance TensorRT tensorrt , docker , containers , aws	7	2313	October 20, 2022
Nvidia Installation Broken Linux cuda , tensorflow , hdmi , linux-driver	2	2003	July 6, 2024
Configuring multiple versions of TensorRT and Tensorflow on HPC share cluster; TF-TRT Warning: Cannot dlopen some TensorRT libraries TensorRT	8	12688	June 28, 2023
TensorRT Installation on AWS EC2 Ubuntu 18.04 Error TensorRT	4	969	October 7, 2021
Unable to load the 'nvidia-drm' kernel module on Ubuntu 18.04 Linux	13	15327	October 12, 2021
A6000 Nvidia Driver Installation Error (Ubuntu Server 24, Kernel 6.8) Linux	10	1301	May 27, 2024
[370.28] with kernel [4.8] on >=2015 machines: driver claims card not supported if nvidia is not primary card Linux	37	21387	September 26, 2017