CUDA error at bandwidthTest

antonio.puertasgallardo · March 8, 2019, 10:54am

Dear All,

I got installed a 2 NVIDIA tesla M60 and I got the right PASSED test from deviceQuery, see logs below,
still when I launch the bandwidthTest i got the follwoing error:

[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Tesla M60
Quick Mode

CUDA error at bandwidthTest.cu:730 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”

any idea hot to solve this ?

Thanks,

Antonio

Logs:
Linux s-jrciprhpc103p 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 418.39 Sat Feb 9 19:19:37 CST 2019
GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)

./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: “Tesla M60”
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 8129 MBytes (8524136448 bytes)
(16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1178 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 131 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: “Tesla M60”
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 8129 MBytes (8524136448 bytes)
(16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1178 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 132 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: “Tesla M60”
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 8129 MBytes (8524136448 bytes)
(16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1178 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 195 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 3: “Tesla M60”
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 8129 MBytes (8524136448 bytes)
(16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1178 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 196 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Peer access from Tesla M60 (GPU0) → Tesla M60 (GPU1) : Yes
Peer access from Tesla M60 (GPU0) → Tesla M60 (GPU2) : No
Peer access from Tesla M60 (GPU0) → Tesla M60 (GPU3) : No
Peer access from Tesla M60 (GPU1) → Tesla M60 (GPU0) : Yes
Peer access from Tesla M60 (GPU1) → Tesla M60 (GPU2) : No
Peer access from Tesla M60 (GPU1) → Tesla M60 (GPU3) : No
Peer access from Tesla M60 (GPU2) → Tesla M60 (GPU0) : No
Peer access from Tesla M60 (GPU2) → Tesla M60 (GPU1) : No
Peer access from Tesla M60 (GPU2) → Tesla M60 (GPU3) : Yes
Peer access from Tesla M60 (GPU3) → Tesla M60 (GPU0) : No
Peer access from Tesla M60 (GPU3) → Tesla M60 (GPU1) : No
Peer access from Tesla M60 (GPU3) → Tesla M60 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 4
Result = PASS

generix · March 8, 2019, 11:56am

Is nvidia-persistenced running? If not, does starting it help?

antonio.puertasgallardo · March 8, 2019, 1:14pm

Many Thanks for the help !!!

It was not running.
I started with following commands:
sudo systemctl enable nvidia-persistenced
sudo /usr/bin/nvidia-persistenced --verbose

I launched the bandwidthTest

I got same error:

./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Tesla M60
Quick Mode

CUDA error at bandwidthTest.cu:730 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”

any other things I shoud do or I missed.

Antonio

nvidia-bug-report.log.gz (1.91 MB)

generix · March 8, 2019, 1:29pm

Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
[url]https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/[/url]

antonio.puertasgallardo · March 8, 2019, 1:46pm

Dear All,

Many thanks again. File posted as requested.

Antonio

generix · March 8, 2019, 2:03pm

The Teslas are in graphics mode, please switch them to compute mode:
[url]https://docs.nvidia.com/grid/latest/grid-gpumodeswitch-user-guide/index.html[/url]

antonio.puertasgallardo · March 8, 2019, 2:45pm

I can not donwload the software from the Enterprise Portal.
Could you send me only the packet gpumodeswitch ?

generix · March 8, 2019, 3:46pm

No, sorry, looks like it’s only included in the vGPU software package. Might also be a red herring. Does bandwidthtest work if you just run it on one gpu, e.g using options --device=0 --dtoh

antonio.puertasgallardo · March 8, 2019, 3:52pm

Thanks for help me. Much appreciated !!!
Nope. The bandwidthtest it doesn’t work.

But I purchased the cards for computing, in principle i should be able to change it. Is there any way to reset the cards to defaults parameters…I need only to have this 2 cards working for computing and I don’t need this vGPU stuff.

Logs:

./bandwidthTest --device=0 --dtoh
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Tesla M60
Quick Mode

CUDA error at bandwidthTest.cu:618 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”

antonio.puertasgallardo · March 8, 2019, 3:59pm

Just to give you 2 more hints about the issue…

I compiled and run a simple matrixmul and it doesn’t work either:

./matrixMul
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Tesla M60” with compute capability 5.2

MatrixA(320,320), MatrixB(640,320)
CUDA error at matrixMul.cu:152 code=46(cudaErrorDevicesUnavailable) “cudaMalloc(reinterpret_cast<void **>(&d_A), mem_size_A)”

and also Add a vector:

./vectorAdd
[Vector addition of 50000 elements]
Failed to allocate device vector A (error code all CUDA-capable devices are busy or unavailable)!

generix · March 8, 2019, 4:09pm

Even the bandwidthtest sample alone is not that complex to easily fail. Maybe this is just a regression in cuda 10.1 or the driver, did you try with an earlier toolkit, 10.0 or 9.2? In any case you should mail the bug-report.log together with a description of the problem to linux-bugs[at]nvidia.com . Maybe that brings up something new.

antonio.puertasgallardo · March 8, 2019, 4:13pm

Again many thanks for your help. I sent the email to linux-bugs@nvidia.com.

yangjiahua0809 · May 10, 2019, 3:37am

Hi, Antonio, is now the problem solved? I met the same error.

zhuhong_1115 · August 13, 2019, 1:14pm

I got the same error with Tesla P4.
I’ve tried with CUDA toolkit 7.5, 8.0 and 9.0.
Driver version: 410.107
OS: Rhel 7.4
Kernel: 3.10
The server is created on a virtualization platform like Xenserver, does it need additional configuration compared to that created on physical machine?
Any help would be appreatiated.

Topic		Replies	Views
CUDA error, bandwithTest.exe CUDA Setup and Installation	12	2673	January 21, 2019
Tesla installs, deviceQuery OK, bandwidthTest hangs (100%CPU) CUDA Programming and Performance	11	16585	March 24, 2010
Cuda 3.2 on Ubuntu 10.10 with GeForce GTX 260 deviceQuery works but bandwithTest fails CUDA Programming and Performance	3	16240	March 1, 2011
bandwidthTest test fails CUDA Programming and Performance	1	760	December 19, 2011
bandwidthTest crashes bandwidthTest crashes when run CUDA Programming and Performance	4	7481	October 6, 2009
device to device bandwidth; a weird crash... CUDA Programming and Performance	0	2757	September 14, 2009
Error for bandwidth Test after install CUDA in Mac CUDA Setup and Installation	4	2866	March 24, 2018
CUDA error at bandwidthTest.cu:730 code=2(cudaErrorMemoryAllocation) "cudaEventCreate(&start)"? CUDA Setup and Installation	2	1529	July 3, 2020
CUDA error at bandwidthTest.cu for GeForce GTX 660M CUDA Setup and Installation	5	4642	September 10, 2013
All CUDA-capable devices are busy or unavailable Tesla V100 Accelerated Computing cuda	0	868	December 28, 2020

CUDA error at bandwidthTest

Related topics