Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) RTX3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Segformer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
cd ~/workspace/taoscript/notebooks/tao_launcher_starter_kit/segformer
docker run --rm -it --gpus all --shm-size=16g -v $PWD:/workspace -v $PWD/../data/segformer:/data nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt segformer train -e /workspace/specs/train_isbi.yaml -r /workspace/isbi-experiment -g 4
Using the isbi example, we tested when the batch-size was 4 and 16, respectively. batch-size is just a change in the spec file.
As a result of my experiment, at 1000 iterations
When batch-size is 4, it takes 2 minutes and 20 seconds, and when batch-size is 16, it takes about 9 minutes.
When batch-size is 16 and workers_per_gpu is increased from 1 to 4, it seems to be a little faster (takes 7 minutes), but the performance is still much different than batch-size 4.
This is not the right way to compare. Because the number of samples dealt are different. The samples per iter will vary based on num_gpus . So, it is important to compare the same no. of samples.
For example, you can compare something like:
32 bs per gpu - 1GPU - 4 iters time,
16 bs per gpu - 4 GPU - 2 iters time,
Above two cases will cover 128 samples and will be fair comparison.
More,
if you set max_iters:1 and bs_per_gpu:2 , gpu:1 ==> then 2 images per iter. For each iter, the dataloader will randomly select 2 images.
if you set max_iters:1 and bs_per_gpu:2 , gpu:2 ==> then 4 images per iter. For each iter, the dataloader will randomly select 4 images.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
No, should be max_iters.
In short, when you are going to compare, please set the same max_iters * bs_per_gpu * gpu