You can run
$ sudo docker run --rm --runtime=nvidia -ti nvidia/cuda
Refer to https://developer.nvidia.com/blog/gpu-containers-runtime/ to install if not.
$ sudo apt-get install -y nvidia-docker2
You can run
$ sudo docker run --rm --runtime=nvidia -ti nvidia/cuda
Refer to https://developer.nvidia.com/blog/gpu-containers-runtime/ to install if not.
$ sudo apt-get install -y nvidia-docker2
I got this error when i run docker run command you said
$ sudo apt-get install -y nvidia-docker2
I have installed it still getting same error
still getting error response from dameon
$ sudo apt install nvidia-container-toolkit
$ sudo apt-get install nvidia-docker2
$ sudo pkill -SIGHUP dockerd
I already have container toolkit and installed nvidia-docker2 and also ran the last command you mentioned.
And all were ran succefully.
what’s the next step?
Please check the original issue. If issue is gone, we can close this topic.
I am still getting the original issue
It is not the same issue. And also it is not the original issue. Please ignore it.
Please check again in the notebook. That is the original issue.
The error after nvidia docker2 installation is changed but the original issue while running in the notebook is same
Please exit the notebook.
And trigger notebook again and run it.
Please follow 4.1 section of the notebook.
“=” in the checkpoint file name should removed before using the checkpoint in command.
I am not getting could please explain a bit where exactly i have to make changes?
which file is the checkpoint file and where it is located
I read this section but where I have to make changes I am not getting
This my .yaml file:
output_dir: /root/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/action_recognition_net/actionrecognitionnet/rgb_3d_ptm
encryption_key: cXQ5NTRwMnU0YzNlMXNxNzEyNmkyb2JoMHE6ODVhMDJlMDctZTg1OC00ZmJiLThmMTUtOGVhN2Y3YTRmMmRl
model_config:
model_type: rgb
backbone: resnet18
rgb_seq_length: 3
input_type: 3d
sample_strategy: consecutive
dropout_ratio: 0.0
train_config:
optim:
lr: 0.001
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_steps: [5, 15, 20]
lr_decay: 0.1
epochs: 20
checkpoint_interval: 1
dataset_config:
train_dataset_dir: /root/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/action_recognition_net/data/actionrecognitionnet/train
val_dataset_dir: /root/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/action_recognition_net/data/actionrecognitionnet/test
label_map:
fall_floor: 0
ride_bike: 1
output_shape:
there is no ‘=’ sign.
please help me out