How to train a model with multiple GPUs

• Hardware (T4/V100/Xavier/Nano/etc) :GTX1080ti
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Detectnet_v2 and Classification
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) : tlt:3.0 and docker_tag: v3.0-dp-py3

I want to train a model on multiple GPUs in one system and tlt has --gpus and --gpu_index, and I set --gpus=2 and --gpu_index=0;1, I get error and doesn’t allow to continue training, I also check --gpu_index=0,1 and --gpu_index=[0,1], I get same error.

Please set
--gpu_index 0 1

Don’t work

Can you share the command line and full log?

Related topic: AssertionError: The number of GPUs ([1]) must be the same as the number of GPU indices (4) provided - #14 by Morganh

tlt lprnet inference --gpus 2 --gpu_index 0 1 -i /workspace/tlt-experiments/ocr/data/train/image -e /workspace/tlt-experiments/ocr/specs/tutorial_spec.txt -m /workspace/tlt-experiments/ocr/experiment_dir_unpruned/weights/lprnet_epoch-60.tlt -k nvidia_tlt

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.

Do you have the full log?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.