Error while running action recognition net

You can run
$ sudo docker run --rm --runtime=nvidia -ti nvidia/cuda

Refer to https://developer.nvidia.com/blog/gpu-containers-runtime/ to install if not.
$ sudo apt-get install -y nvidia-docker2

1 Like

I got this error when i run docker run command you said
image

$ sudo apt-get install -y nvidia-docker2

I have installed it still getting same error

still getting error response from dameon

$ sudo apt install nvidia-container-toolkit
$ sudo apt-get install nvidia-docker2
$ sudo pkill -SIGHUP dockerd

I already have container toolkit and installed nvidia-docker2 and also ran the last command you mentioned.
And all were ran succefully.
what’s the next step?

this is the output of nvidia-smi:

Please check the original issue. If issue is gone, we can close this topic.

I am still getting the original issue

and getting this error after docker2 installation:

please help me out

It is not the same issue. And also it is not the original issue. Please ignore it.
Please check again in the notebook. That is the original issue.

The error after nvidia docker2 installation is changed but the original issue while running in the notebook is same

Please exit the notebook.
And trigger notebook again and run it.

Hey Morganh, now getting this error

Please follow 4.1 section of the notebook.
“=” in the checkpoint file name should removed before using the checkpoint in command.

I am not getting could please explain a bit where exactly i have to make changes?
which file is the checkpoint file and where it is located

Check the .yaml file and find where the checkpoint is.
Then rename it.

I read this section but where I have to make changes I am not getting

This my .yaml file:
output_dir: /root/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/action_recognition_net/actionrecognitionnet/rgb_3d_ptm
encryption_key: cXQ5NTRwMnU0YzNlMXNxNzEyNmkyb2JoMHE6ODVhMDJlMDctZTg1OC00ZmJiLThmMTUtOGVhN2Y3YTRmMmRl
model_config:
model_type: rgb
backbone: resnet18
rgb_seq_length: 3
input_type: 3d
sample_strategy: consecutive
dropout_ratio: 0.0
train_config:
optim:
lr: 0.001
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: MultiStep
lr_steps: [5, 15, 20]
lr_decay: 0.1
epochs: 20
checkpoint_interval: 1
dataset_config:
train_dataset_dir: /root/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/action_recognition_net/data/actionrecognitionnet/train
val_dataset_dir: /root/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/action_recognition_net/data/actionrecognitionnet/test
label_map:
fall_floor: 0
ride_bike: 1
output_shape:

  • 224
  • 224
    batch_size: 32
    workers: 8
    clips_per_video: 5
    augmentation_config:
    train_crop_type: no_crop
    horizontal_flip_prob: 0.5
    rgb_input_mean: [0.5]
    rgb_input_std: [0.5]
    val_center_crop: False

there is no ‘=’ sign.

please help me out