Docker socket permission error

Prior thread on this issue because new user can only do three replies.

Yes same error.

Found this guide that gives additional steps to fix after docker has been installed regarding changing ownership of existing directories and the important step of logining out and back in so the group permissions take. I did not do that prior.

Reran the tlt-train command without sudo and get the same error message.

Able to run hello world without being sudo

Able to do docker login nvcr.io without being sudo

When I launch the docker tlt-streamanalytics instance it converts to root as the user which in theory takes us back to the sudo problem.

Screen shot with error message that is the same prior not being run as sudo.

I will work on deleting the docker container and check permissions on externally mounted directories and see if that helps.

Same error messages as prior for non sudo and sudo examples

I think the problem may be related to not running the jupyter notebook first to get directories structured and files downloaded. The writeup at https://developer.nvidia.com/blog/training-instance-segmentation-models-using-maskrcnn-on-the-transfer-learning-toolkit/ are steps to run via bash and no mention of running the notebook first. Working through that scenario now with the other notebook examples and will test when I get to MaskRCNN example that after doing it in the notebook it may work from command line.

For Maskrcnn training, the “-d” is needed.
See https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/text/training_model.html#training-a-maskrcnn-model

Train the MaskRCNN model using this command:

tlt-train [-h] mask_rcnn -e <experiment_spec>
-d <output_dir>
-k
–gpus <num_gpus>

Required Arguments
-d, --model_dir: Path to the folder where the experiment output is written.

-k, --key: Provide the encryption key to decrypt the model.

-e, --experiment_spec_file: Experiment specification file to set up the evaluation experiment. This should be the same as the training specification file.

Thanks for the update. In the tutorial at https://developer.nvidia.com/blog/training-instance-segmentation-models-using-maskrcnn-on-the-transfer-learning-toolkit/ they have -r in the command line. Assuming it should be -d and a typo