Could it be that the higher the batchsize, the more it affects the training speed?

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) RTX3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Segformer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

cd ~/workspace/taoscript/notebooks/tao_launcher_starter_kit/segformer

docker run --rm -it --gpus all --shm-size=16g -v $PWD:/workspace -v $PWD/../data/segformer:/data segformer train -e /workspace/specs/train_isbi.yaml -r /workspace/isbi-experiment -g 4

Using the isbi example, we tested when the batch-size was 4 and 16, respectively. batch-size is just a change in the spec file.

As a result of my experiment, at 1000 iterations
When batch-size is 4, it takes 2 minutes and 20 seconds, and when batch-size is 16, it takes about 9 minutes.

When batch-size is 16 and workers_per_gpu is increased from 1 to 4, it seems to be a little faster (takes 7 minutes), but the performance is still much different than batch-size 4.

This is not the right way to compare. Because the number of samples dealt are different. The samples per iter will vary based on num_gpus . So, it is important to compare the same no. of samples.

For example, you can compare something like:
32 bs per gpu - 1GPU - 4 iters time,
16 bs per gpu - 4 GPU - 2 iters time,

Above two cases will cover 128 samples and will be fair comparison.

if you set max_iters:1 and bs_per_gpu:2 , gpu:1 ==> then 2 images per iter. For each iter, the dataloader will randomly select 2 images.
if you set max_iters:1 and bs_per_gpu:2 , gpu:2 ==> then 4 images per iter. For each iter, the dataloader will randomly select 4 images.

And, images per iter is not related to max_iters.

4 iters time means logging_interval?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

No, should be max_iters.

In short, when you are going to compare, please set the same
max_iters * bs_per_gpu * gpu

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.