AWS K80 Docker

BARO · August 18, 2019, 8:53am

Hi every one… I installed DIGITS in my AWS cloud server with K80 GPU but when i try to training the model appear this message:

Started BatchTransformer thread 87
Loading mean file from: /workspace/jobs/20190818-080628-28c8/train_db/mean.binaryproto
Loading mean file from: /workspace/jobs/20190818-080628-28c8/train_db/mean.binaryproto
Loading mean file from: /workspace/jobs/20190818-080628-28c8/train_db/mean.binaryproto
Data Reader threads: 3, out queues: 12, depth: 10
{0} Starting 3 internal thread(s) on device 0
Started internal thread 91 on device 0, rank 0
Opened lmdb /workspace/jobs/20190818-080628-28c8/train_db/features
Started internal thread 92 on device 0, rank 0
Opened lmdb /workspace/jobs/20190818-080628-28c8/train_db/features
Started internal thread 93 on device 0, rank 0
Opened lmdb /workspace/jobs/20190818-080628-28c8/train_db/features
Output data size: 10, 3, 384, 1248
Parser threads: 3 (auto)
Transformer threads: 4 (auto)
Started internal thread 78 on device 0, rank 0
Started internal thread 79 on device 0, rank 0
Started internal thread 82 on device 0, rank 0
Started internal thread 80 on device 0, rank 0
Check failed: error == cudaSuccess (209 vs. 0) no kernel image is available for execution on the device

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1849 C python 81MiB |
±----------------------------------------------------------------------------+

some idea?

CALL-151 · August 26, 2019, 6:09pm

The K80 is CUDA Compute capability 3.7. The error message, 209, indicated the underlying framework was compiled without 3.7 support. For example, nvcaffe was built with 5.2 and above.

Please consider launching DIGITS inside other AWS GPU instances (for example P3 instance). In case that is not an option for you, you will need to build nvcaffe with Kepler support and run DIGITS on top of it.

susana.deluelmo · June 4, 2020, 9:12am

Hi!,
I am having the same problem using the digits docker (nvcr.io/nvidia/digits:19.02-caffe).
When I try to train I get the following error:
Copying source layer inception_5b/relu_5x5_reduce Type:ReLU #blobs=0
Copying source layer inception_5b/5x5 Type:Convolution #blobs=2
Copying source layer inception_5b/relu_5x5 Type:ReLU #blobs=0
Copying source layer inception_5b/pool Type:Pooling #blobs=0
Copying source layer inception_5b/pool_proj Type:Convolution #blobs=2
Copying source layer inception_5b/relu_pool_proj Type:ReLU #blobs=0
Copying source layer inception_5b/output Type:Concat #blobs=0
Ignoring source layer pool5/7x7_s1
Ignoring source layer pool5/drop_7x7_s1
Ignoring source layer loss3/classifier
Ignoring source layer loss3/loss3
Starting Optimization
Solving Learning Rate Policy: exp
Reserving 23918336 bytes of shared learnable space for type FLOAT
Initial Test started…
Iteration 0, Testing net (#0)
Ignoring source layer train_data
Ignoring source layer train_label
Ignoring source layer train_transform
Check failed: error == cudaSuccess (209 vs. 0) no kernel image is available for execution on the device

I don’t know if I should also build nvcaffe with Kepler support and run digits on top of it. In this case, how should I do it?

Topic		Replies	Views
Has anyone(!) managed recently to get AWS and NVIDIA Digits easily working? Amazon Web Services (AWS)	9	2030	November 26, 2018
Easy Multi-GPU Deep Learning with DIGITS 2 Technical Blog	34	670	January 28, 2016
Kernel NULL pointer dereference (ubuntu 16.04.4, Tesla K80, and driver version 375.51) Linux	2	1779	May 5, 2017
Running Digits from nvcr.io modprobe error Docker and NVIDIA Docker	0	859	April 17, 2018
Rootless Podman Container - CUDA Operation Not Supported - Error Code 801 DRIVE AGX Orin General driveos-cuda	11	582	October 10, 2024
Running Cuda on Docker CUDA Setup and Installation	7	17344	May 23, 2016
[BUG] failed to start docker container in orin target with error: failed to create endpoint on network bridge, operation not supported DRIVE AGX Orin General docker	7	1980	September 5, 2024
New to NVIDIA Drive OS, Hands on, Next steps DRIVE AGX Xavier General driveos , drive-graphics	9	1781	October 12, 2021
Nvidia-container-cli initialization Error Docker and NVIDIA Docker docker , wsl	0	4177	December 25, 2021
Errors running deepstream-bodypose-3d in docker DeepStream SDK	6	377	September 18, 2023

AWS K80 Docker

Related topics