Accelerator not found: EC2 p2.xlarge, PGI Community Edition

daniel7 · April 14, 2019, 9:25pm

Hi,
I am using the PGI 18.10 Community Edition AMI on AWS EC2 p2.xlarge instances.

When I run pgaccelinfo -v, I get:

CUDA Driver Version:           10000
could not initialize CUDA runtime, error code=100
No accelerators found.
Check the permissions on your CUDA device

I have tried to follow the instructions listed in this thread
https://www.pgroup.com/userforum/viewtopic.php?p=7680&sid=3326ee9998a6dff2e706a7190f348106
It can’t find the nvidia module when I run

modprobe -v nvidia

but I assume that this post is outdated anyway.

What I did:
In order to save some money while installing other dependencies and doing other work I’ve started the instance multiple times as t2.micro which does not have an accelerator. This worked fine and after changing the type back to p2.xlarge it always ‘just worked’ - until it didn’t. Not sure if this might have caused this error, but my guess is that it has nothing to do with it.

lspci shows the K80 to be connected.

Any ideas on how to solve this?

Thanks,
Daniel

MatColgrove · April 15, 2019, 5:00pm

Hi Daniel,

I’m thinking that this is more a system issue rather than a PGI issue but let’s see if we can diagnose the problem.

could not initialize CUDA runtime, error code=100

Error 100 indicates that there’s no device. Can you try running “nvidai-smi” to see if it recognizes the device?

Perhaps you got a bad node or Amazon changed the configuration on the device so the permissions are incorrect? Either way, you’ll want to contact Amazon.

Note that I just logged into a p2-xlarge system and it worked fine. So I suspect the issue was with a particular node.

-Mat

daniel7 · April 15, 2019, 7:41pm

Hi Mat,
thanks for your response.

So, I’ve run nvidia-smi and it just said to check the driver.

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I’ve now created a new p2.xlarge instance and attached the volume from the original instance. I am guessing that this should probably result in running on a different node. Well, doing this didn’t help.

I’ll contact AWS about this. In the meantime it’s probably the best to just start over with a completely fresh instance and image. Although, do you have any idea on what might have caused this (assuming it’s not a problem with aws). Would be pretty annoying if it happens again.

Daniel

MatColgrove · April 15, 2019, 7:58pm

Although, do you have any idea on what might have caused this

Since nvidia-smi failed, my best guess is that it’s a bad device, the device needs to be reset, a permission issue with the device, or a problem with the CUDA driver.

I’ll contact AWS about this.

Sound good. Unfortunately there’s not much we can do here if there’s a hardware or system issue.

-Mat

Topic		Replies	Views
Accelerators not found Legacy PGI Compilers	8	14320	June 6, 2010
EC2 Ubuntu 18.04 LTS P3.8xlarge CUDA install with Tesla V100 `nvidia-smi` fails, drivers cannot install as no recognised device exists CUDA Setup and Installation	7	6209	January 14, 2019
pgaccelinfo error code=999 Legacy PGI Compilers	2	2634	March 19, 2020
Error Message: call to cuInit returned error 100: No device Legacy PGI Compilers	3	19378	July 12, 2010
Nvidia-smi commands fails on AWS EC2 Instance Drivers - Linux, Windows, MacOS cuda , drivers	4	344	October 3, 2024
No accelerators found nvc, nvc++ and nvfortran	4	752	October 6, 2023
Error when verifying GPU is on when using PGI Legacy PGI Compilers	1	7329	February 18, 2020
Unable to install cuda 10.0 on Ubuntu 18.04 on EC2 AWS CUDA Setup and Installation	2	835	July 6, 2022
Problem with installing Nvidia HPC-SDK on Ubuntu: No accelerators found. Check the permissions on your CUDA device CUDA Setup and Installation	2	1838	June 7, 2023
Nvaccelinfo: No accelerators found CUDA Setup and Installation	0	309	March 30, 2024

Accelerator not found: EC2 p2.xlarge, PGI Community Edition

Related topics