docker: invalid reference format and "docker run" requires at least 1 argument errors

TegwynTwmffat · November 22, 2018, 11:37am

Hello!

I’m using docker on AWS with a Nvidia Volta AMI and P3.2Xlarge instance and 32 Gb volume attached and have signed in ok with API key and have run the pytorch example successfully (nvidia-docker run --rm -it nvcr.io/nvidia/pytorch:17.10):

docker pull nvcr.io/nvidia/pytorch:17.10
nvidia-docker run --rm -it nvcr.io/nvidia/pytorch:17.10  ....................... Works ok.
Run the MNIST example:
cd /opt/pytorch/examples/mnist     ............... in root@fflklsk:/workspace#
python main.py  ....................... Works ok.

I can pull docker pull nvcr.io/nvidia/digits:18.11-tensorflow and it downloads and extracts successfully.

I try to run: nvidia-docker run --name digits -d -p 8888:5000nvcr.io/nvidia/digits

ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000nvcr.io/nvidia/digits
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
ubuntu@ip-172-31-27-120:~$]

I then tried all kinds of permutations as below, with no success.
Please can anybody see what the mistake I am making is? ………. Thanks!

Logging into the NGC Registry at nvcr.io....Login Succeeded
ubuntu@ip-172-31-27-120:~$ docker pull nvcr.io/nvidia/digits:18.11-tensorflow
18.11-tensorflow: Pulling from nvidia/digits
Digest: sha256:c42b32e5c0ca3d428a72df0683f0c7cfd7ef022d544bc0257370d8ad468893c0
Status: Image is up to date for nvcr.io/nvidia/digits:18.11-tensorflow
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000 -v /h                                     ome/ubuntu/data:/data:ro -v /home/ubuntu/digits- jobs:/workspace/jobs nvcr.io/nv                                     idia/digits
docker: invalid reference format.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000 -v /h                                     ome/ubuntu/data:/data:ro -v /home/ubuntu/digits-jobs:/jobs  nvcr.io/nvidia/digit                                     s
Unable to find image 'nvcr.io/nvidia/digits:latest' locally
docker: Error response from daemon: manifest for nvcr.io/nvidia/digits:latest no                                     t found.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000nvcr.io/nvidia/digits
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000nvcr.io/nvidia/digits:18.11
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000nvcr.io/nvidia/digits:18.11-tensorflow
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --rm -it  -d -p 8888:5000nvcr.io/nvidia/digits:18.11-tensorflow
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --rm -it  -d -p 8888:5000nvcr.io/nvidia/digits:18.11
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --rm -it  -d -p 8888:5000nvcr.io/nvidia/digits
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage:  docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
ubuntu@ip-172-31-27-120:~$

ubuntu@ip-172-31-27-120:~$ nvidia-docker run -d --name digits-18:11 -p 8888:5000  --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/digits:18:11
docker: invalid reference format.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$ nvidia-docker run -d --name digits-18:11-tensorflow -p 8888:5000  --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/digits:18:11-tensorflow
docker: invalid reference format.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$

ubuntu@ip-172-31-27-120:~$ nvidia-docker run -d --name digits-18:11 -p 8888:5000 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/digits                                                                         s
Unable to find image 'nvcr.io/nvidia/digits:latest' locally
docker: Error response from daemon: manifest for nvcr.io/nvidia/digits:latest not found.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$

TegwynTwmffat · November 22, 2018, 1:41pm

I tried to pull 18:10, but I’m now getting error: no space left on device

Is this something to do with: ???
–shm-size=1g --ulimit memlock=-1
in the command line to:
nvidia-docker run

How do I delete 18:11-tensorflow ???

Thanks!

f7d4e939e2f3: Pull complete
69fd65e359c1: Pull complete
b3073cb6c892: Pull complete
cd3cfb41069b: Extracting  475.4MB/475.4MB
60741008fcc3: Download complete
6bf9ad071f73: Download complete
c3cfc72a0228: Download complete
e5f682d39d70: Download complete
c6dc998f001d: Download complete
0789191dcf7e: Download complete
f502b222f40f: Download complete
2eb32606ef18: Download complete
78768428f5ae: Download complete
4ab52b2195eb: Download complete
2c27729d0333: Download complete
failed to register layer: Error processing tar file(exit status 1): write /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so: no space left on device
ubuntu@ip-172-31-27-120:~$

TegwynTwmffat · November 22, 2018, 1:49pm

ubuntu@ip-172-31-27-120:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             30G     0   30G   0% /dev
tmpfs           6.0G  8.9M  6.0G   1% /run
/dev/xvda1       31G   27G  4.4G  86% /
tmpfs            30G     0   30G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            30G     0   30G   0% /sys/fs/cgroup
/dev/loop0       18M   18M     0 100% /snap/amazon-ssm-agent/930
/dev/loop1       89M   89M     0 100% /snap/core/5897
/dev/loop2       13M   13M     0 100% /snap/amazon-ssm-agent/295
/dev/loop3       87M   87M     0 100% /snap/core/5145
tmpfs           6.0G     0  6.0G   0% /run/user/1000
ubuntu@ip-172-31-27-120:~$

And …

ubuntu@ip-172-31-27-120:~$ df -i
Filesystem      Inodes  IUsed   IFree IUse% Mounted on
udev           7858000    327 7857673    1% /dev
tmpfs          7859998    460 7859538    1% /run
/dev/xvda1     4096000 545266 3550734   14% /
tmpfs          7859998      1 7859997    1% /dev/shm
tmpfs          7859998      3 7859995    1% /run/lock
tmpfs          7859998     16 7859982    1% /sys/fs/cgroup
/dev/loop0          15     15       0  100% /snap/amazon-ssm-agent/930
/dev/loop1       12808  12808       0  100% /snap/core/5897
/dev/loop2          13     13       0  100% /snap/amazon-ssm-agent/295
/dev/loop3       12847  12847       0  100% /snap/core/5145
tmpfs          7859998      4 7859994    1% /run/user/1000
ubuntu@ip-172-31-27-120:~$

TegwynTwmffat · November 22, 2018, 3:02pm

I increased the volume capacity to 64 Gb and no partition adjustment was needed :)

Pulled 18:10 ………… Worked ok

$ ifconfig … Not required.

nvidia-docker run -d --name digits-18.10 -p 8888:5000  --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/digits:18.10           ............... worked ok.

Still need to delete container 18.11-tensorflow.

Used my current IPv4 address from AWS instance panel with :8888 … Worked !!! Now have DIGITS in browser !!!

TegwynTwmffat · November 23, 2018, 1:55pm

…… So I uploaded all my data and tried to create a new object detection dataset, but acter specifying the 4 folders I get this error in digits:

New Object Detection Dataset
train_image_folder 
Folder does not exist or is not reachable
val_image_folder 
Folder does not exist or is not reachable
val_label_folder 
Folder does not exist or is not reachable
train_label_folder 
Folder does not exist or is not reachable

Is this something to do with mounting the directories?

I tried running this in command line:

nvidia-docker run --name digits -d -p 8888:5000 -v /home/ubuntu/myData/wasp:/wasp:ro -v /home/ubuntu/digits-jobs:/workspace/jobs nvcr.io/nvidia/digits:18:10

but i get ‘docker: invalid reference format.’ error … again:

ubuntu@ip-172-31-27-120:~$ ubuntu@ip-172-31-27-120:~$ docker pull nvcr.io/nvidia/digits:18.10
18.10: Pulling from nvidia/digits
Digest: sha256:28ed527eebfd01ea1b7e08b5229aba233f38f33da5a4d1c572d46b618414288a
Status: Image is up to date for nvcr.io/nvidia/digits:18.10
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000 -v /home/ubuntu/myData/wasp:/wasp:ro -v /home/ubuntu/digits-jobs:/workspace/jobs nvcr.io/nvidia/digits:18:10
docker: invalid reference format.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000 -v /home/ubuntu/myData/wasp -v /home/ubuntu/digits-jobs:/workspace/jobs nvcr.io/nvidia/digits:18:10
docker: invalid reference format.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$ nvidia-docker run --name digits -d -p 8888:5000 -v home/ubuntu/myData/wasp -v home/ubuntu/digits-jobs:/workspace/jobs nvcr.io/nvidia/digits:18:10
docker: invalid reference format.
See 'docker run --help'.
ubuntu@ip-172-31-27-120:~$

Please somebody help … Really stuck here !!! … Thanks.

TegwynTwmffat · November 24, 2018, 12:51pm

Still stuck on this!

I don’t really understand the docker file system or the below:

nvidia-docker run -it -v local_dir:container_dir nvcr.io/nvidia/digits:<xx.xx>

I can create a volume:

$ docker volume create myData

What’s the difference between a volume and a directory?

I can list my volumes:

ubuntu@ip-172-31-24-74:~$ docker volume ls
DRIVER              VOLUME NAME
local               1b3c2f8357f2d5ab2ac3b31e3983bb7b1c2b7dae1f8ff4c5568dde9a5a316e43
local               507447a5246619a4f4e5b227f0e9486cadb54c619d95799f6c2f373855814b7c
local               myData
ubuntu@ip-172-31-24-74:~$

I can run digits and access it on port 8888, but what to type into the image path?

TegwynTwmffat · November 24, 2018, 1:53pm

This command:

nvidia-docker run --name digits -d -p 8888:5000 -it -v /home/ubuntu/data/wasp:/data/wasp nvcr.io/nvidia/digits:18.10

… seems to create a directory structure that I can see in my SHH console:

/home/ubuntu/data/wasp …… but not accepted by digits browser as a valid directory. Maybe I should try /data/wasp ?

… but I cant SHH any files into ‘wasp’ so I figure it must be describing what’s in the container?
if so, how do I SHH my files into this directory in the container?

Where and how should I put my image and label files such that digits browser can access them?

TegwynTwmffat · November 24, 2018, 2:10pm

Should I use something like below: ???

$ docker cp foo.txt mycontainer:/foo.txt

If so, I wonder what the path should be?

TegwynTwmffat · November 24, 2018, 3:16pm

nvidia-docker run --name digits3 -d -p 8888:5000 -it -v /home/ubuntu/data/wasp:/workspace/wasp nvcr.io/nvidia/digits:18.10

…… allows digits in browser to use workspace/wasp as a directory.

Still don’t know how to upload files into wasp directory :(

gcrider · November 24, 2018, 7:18pm

Hi,

Our team responded as follows -

The user had a typo in his initial command:

nvidia-docker run --name digits -d -p 8888:5000nvcr.io/nvidia/digits

It should be

nvidia-docker run --name digits -d -p 8888:5000 nvcr.io/nvidia/digits

TegwynTwmffat · November 25, 2018, 9:58am

Thanks for reply …… I hope you’re enjoying your turkey?

Yes, i noticed that error a few days agao. I’ll start a new thread as, to be fair, your comment does solve the initial problem.

TegwynTwmffat · November 25, 2018, 12:30pm

Finally managed to successfully train a set of images on AWS - it only took me 3 days to work it out!