Has anyone(!) managed recently to get AWS and NVIDIA Digits easily working?

Basically, have been on a NVIDIA Deep Learning Course, I want to experiment with it.
I tried setting up a Cloud based instance but encountered various problems and issues.

So…

I was recommended to go the pre-setup route - using AWS and Digits.
But, can i get it to work? No.

There seems to be no definitive and up-to-date documentation.
I set up an AWS NVIDIA Digits instance (p2.xlarge), did all the usual Key Pais stuff etc.

Tried running the recommended: docker pull … 18:06 but received a not enough space message.
Tried requesting a ‘bigger’ Instance Type, but was politely declined.

Has anyone(!) managed recently to get AWS and NVIDIA Digits easily working?
I’l love to hear more…

https://ngc.nvidia.com/registry/nvidia-digits
https://ngc.nvidia.com/docs/aws

Can you share more about the error message? Was out of space referring to the storage volume? If so, how large was your storage volume that you attached to the p2.xlarge?

There’s also an AMI, which uses a slightly older version of DIGITS 6 (vs 6.1), but doesn’t require the user to sort storage: https://aws.amazon.com/marketplace/pp/B076DHKCZJ

Thanks for the feedback and look forward to your response!

Hello!

I spin up my AWS p2.xlarge instance and connect to it.

( p2.xlarge appears as the ‘recommended’ in documentation. )

Then I run the Docker Pull command: docker pull nvcr.io/nvidia/digits:18.06

( And have tried other versions as well, not just 18:06 - with the same result.)

It runs through, and then fails with this message:

748b7b6e777b: Download complete
846867716614: Download complete
53d93d4d8588: Download complete
e75dea597b8c: Download complete

failed to register layer: Error processing tar file(exit status 1): write /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcuinj64.so.9.0.176: no space left on device

What it refers to is not clear to me.

Regards,

Piers.

I tried the same command on a g3.4xlarge.

docker pull nvcr.io/nvidia/digits:18.07

Same error :(

7673e4ca8e30: Waiting
bc56d889abbe: Waiting
cccf041af82d: Waiting
write /var/lib/docker/tmp/GetImageBlob146374276: no space left on device

Any thoughts still appreciated!
I cannot get ‘it’ to install…
:(

I’m trying to use the Nvidia AMI on AWS:

NVIDIA-DIGITS-6.0-1506983594-552c2f03-2105-4838-8723-06ece2494c16-ami-ae0cf4d4.4 (ami-e7988a83)

I’ve got console access to the server through the assigned IP address, but can’t install CUDA 9.0:

/home/ubuntu$ sudo apt-get install cuda-toolkit-9-0
Reading package lists…
Building dependency tree…
Reading state information…
The following additional packages will be installed:
ca-certificates-java cuda-command-line-tools-9-0 cuda-core-9-0
cuda-cublas-9-0 cuda-cublas-dev-9-0 cuda-cudart-9-0 cuda-cudart-dev-9-0
cuda-cufft-9-0 cuda-cufft-dev-9-0 cuda-curand-9-0 cuda-curand-dev-9-0
cuda-cusolver-9-0 cuda-cusolver-dev-9-0 cuda-cusparse-9-0
cuda-cusparse-dev-9-0 cuda-documentation-9-0 cuda-driver-dev-9-0
cuda-libraries-dev-9-0 cuda-license-9-0 cuda-misc-headers-9-0 cuda-npp-9-0
cuda-npp-dev-9-0 cuda-nvgraph-9-0 cuda-nvgraph-dev-9-0 cuda-nvml-dev-9-0
cuda-nvrtc-9-0 cuda-nvrtc-dev-9-0 cuda-samples-9-0 cuda-visual-tools-9-0
default-jre default-jre-headless fontconfig fontconfig-config
fonts-dejavu-core fonts-dejavu-extra freeglut3 freeglut3-dev
hicolor-icon-theme java-common libasound2 libasound2-data libasyncns0
libatk1.0-0 libatk1.0-data libavahi-client3 libavahi-common-data
libavahi-common3 libcairo2 libcups2 libdatrie1 libdrm-amdgpu1 libdrm-common
libdrm-dev libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libflac8
libfontconfig1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgif7
libgl1-mesa-dev libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa libglu1-mesa
libglu1-mesa-dev libgraphite2-3 libgtk2.0-0 libgtk2.0-bin libgtk2.0-common
libharfbuzz0b libice-dev libice6 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2
libllvm6.0 libnspr4 libnss3 libnss3-nssdb libogg0 libpango-1.0-0
libpangocairo-1.0-0 libpangoft2-1.0-0 libpciaccess0 libpcsclite1
libpixman-1-0 libpthread-stubs0-dev libpulse0 libsensors4 libsm-dev libsm6
libsndfile1 libthai-data libthai0 libtiff5 libtxc-dxtn-s2tc0 libvorbis0a
libvorbisenc2 libx11-dev libx11-doc libx11-xcb-dev libx11-xcb1 libxau-dev
libxcb-dri2-0 libxcb-dri2-0-dev libxcb-dri3-0 libxcb-dri3-dev libxcb-glx0
libxcb-glx0-dev libxcb-present-dev libxcb-present0 libxcb-randr0
libxcb-randr0-dev libxcb-render0 libxcb-render0-dev libxcb-shape0
libxcb-shape0-dev libxcb-shm0 libxcb-sync-dev libxcb-sync1 libxcb-xfixes0
libxcb-xfixes0-dev libxcb1-dev libxcomposite1 libxcursor1 libxdamage-dev
libxdamage1 libxdmcp-dev libxext-dev libxfixes-dev libxfixes3 libxi-dev
libxi6 libxinerama1 libxmu-dev libxmu-headers libxmu6 libxrandr2 libxrender1
libxshmfence-dev libxshmfence1 libxt-dev libxt6 libxtst6 libxxf86vm-dev
libxxf86vm1 mesa-common-dev openjdk-8-jre openjdk-8-jre-headless x11-common
x11proto-core-dev x11proto-damage-dev x11proto-dri2-dev x11proto-fixes-dev
x11proto-gl-dev x11proto-input-dev x11proto-kb-dev x11proto-xext-dev
x11proto-xf86vidmode-dev xorg-sgml-doctools xtrans-dev
Suggested packages:
default-java-plugin libasound2-plugins alsa-utils cups-common
librsvg2-common gvfs libice-doc liblcms2-utils pcscd pulseaudio lm-sensors
libsm-doc libxcb-doc libxext-doc libxt-doc icedtea-8-plugin libnss-mdns
fonts-ipafont-gothic fonts-ipafont-mincho fonts-wqy-microhei
fonts-wqy-zenhei fonts-indic
The following NEW packages will be installed:
ca-certificates-java cuda-command-line-tools-9-0 cuda-core-9-0
cuda-cublas-9-0 cuda-cublas-dev-9-0 cuda-cudart-9-0 cuda-cudart-dev-9-0
cuda-cufft-9-0 cuda-cufft-dev-9-0 cuda-curand-9-0 cuda-curand-dev-9-0
cuda-cusolver-9-0 cuda-cusolver-dev-9-0 cuda-cusparse-9-0
cuda-cusparse-dev-9-0 cuda-documentation-9-0 cuda-driver-dev-9-0
cuda-libraries-dev-9-0 cuda-license-9-0 cuda-misc-headers-9-0 cuda-npp-9-0
cuda-npp-dev-9-0 cuda-nvgraph-9-0 cuda-nvgraph-dev-9-0 cuda-nvml-dev-9-0
cuda-nvrtc-9-0 cuda-nvrtc-dev-9-0 cuda-samples-9-0 cuda-toolkit-9-0
cuda-visual-tools-9-0 default-jre default-jre-headless fontconfig
fontconfig-config fonts-dejavu-core fonts-dejavu-extra freeglut3
freeglut3-dev hicolor-icon-theme java-common libasound2 libasound2-data
libasyncns0 libatk1.0-0 libatk1.0-data libavahi-client3 libavahi-common-data
libavahi-common3 libcairo2 libcups2 libdatrie1 libdrm-amdgpu1 libdrm-common
libdrm-dev libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libflac8
libfontconfig1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgif7
libgl1-mesa-dev libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa libglu1-mesa
libglu1-mesa-dev libgraphite2-3 libgtk2.0-0 libgtk2.0-bin libgtk2.0-common
libharfbuzz0b libice-dev libice6 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2
libllvm6.0 libnspr4 libnss3 libnss3-nssdb libogg0 libpango-1.0-0
libpangocairo-1.0-0 libpangoft2-1.0-0 libpciaccess0 libpcsclite1
libpixman-1-0 libpthread-stubs0-dev libpulse0 libsensors4 libsm-dev libsm6
libsndfile1 libthai-data libthai0 libtiff5 libtxc-dxtn-s2tc0 libvorbis0a
libvorbisenc2 libx11-dev libx11-doc libx11-xcb-dev libx11-xcb1 libxau-dev
libxcb-dri2-0 libxcb-dri2-0-dev libxcb-dri3-0 libxcb-dri3-dev libxcb-glx0
libxcb-glx0-dev libxcb-present-dev libxcb-present0 libxcb-randr0
libxcb-randr0-dev libxcb-render0 libxcb-render0-dev libxcb-shape0
libxcb-shape0-dev libxcb-shm0 libxcb-sync-dev libxcb-sync1 libxcb-xfixes0
libxcb-xfixes0-dev libxcb1-dev libxcomposite1 libxcursor1 libxdamage-dev
libxdamage1 libxdmcp-dev libxext-dev libxfixes-dev libxfixes3 libxi-dev
libxi6 libxinerama1 libxmu-dev libxmu-headers libxmu6 libxrandr2 libxrender1
libxshmfence-dev libxshmfence1 libxt-dev libxt6 libxtst6 libxxf86vm-dev
libxxf86vm1 mesa-common-dev openjdk-8-jre openjdk-8-jre-headless x11-common
x11proto-core-dev x11proto-damage-dev x11proto-dri2-dev x11proto-fixes-dev
x11proto-gl-dev x11proto-input-dev x11proto-kb-dev x11proto-xext-dev
x11proto-xf86vidmode-dev xorg-sgml-doctools xtrans-dev
The following packages will be upgraded:
libdrm2
1 upgraded, 165 newly installed, 0 to remove and 91 not upgraded.
Need to get 1,186 MB of archives.
After this operation, 2,747 MB of additional disk space will be used.
………… Nothing more happens and connection times out.

Instance type: t3.xlarge

Also, cant launch DIGITS using http://10.1.2.3:5000/ with my IPV4 public IP address.

Anybody got any ideas on this?

Finally managed to successfully train a set of images on AWS - it only took me 3 days to work it out! Give me a shout if you need help.

Well done!

maybe write a blog about it?

Yes I will put it up on Hackaday this morning before I forget how I did it. I report back with the link in an hour or so.

https://hackaday.io/project/161581-wasp-and-asian-hornets-sentry-gun/log/156348-training-networks-on-the-amazon-cloud-gpus