Could it be that the higher the batchsize, the more it affects the training speed?

yeongjae1 · November 15, 2023, 7:32am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) RTX3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Segformer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)

github.com

NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/segformer/specs/train_isbi.yaml

train:
  exp_config:
      manual_seed: 49
  checkpoint_interval: 200
  logging_interval: 50
  max_iters: 1000
  resume_training_checkpoint_path: null
  validate: True
  validation_interval: 500
  trainer:
      find_unused_parameters: True
      sf_optim:
        lr: 0.00006
model:
  input_height: 512
  input_width: 512
  pretrained_model_path: null
  backbone:
    type: "mit_b1"
dataset:

This file has been truncated. show original

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

cd ~/workspace/taoscript/notebooks/tao_launcher_starter_kit/segformer

docker run --rm -it --gpus all --shm-size=16g -v $PWD:/workspace -v $PWD/../data/segformer:/data nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt segformer train -e /workspace/specs/train_isbi.yaml -r /workspace/isbi-experiment -g 4

Using the isbi example, we tested when the batch-size was 4 and 16, respectively. batch-size is just a change in the spec file.

As a result of my experiment, at 1000 iterations
When batch-size is 4, it takes 2 minutes and 20 seconds, and when batch-size is 16, it takes about 9 minutes.

yeongjae1 · November 15, 2023, 7:45am

When batch-size is 16 and workers_per_gpu is increased from 1 to 4, it seems to be a little faster (takes 7 minutes), but the performance is still much different than batch-size 4.

Morganh · November 15, 2023, 7:52am

This is not the right way to compare. Because the number of samples dealt are different. The samples per iter will vary based on num_gpus . So, it is important to compare the same no. of samples.

For example, you can compare something like:
32 bs per gpu - 1GPU - 4 iters time,
16 bs per gpu - 4 GPU - 2 iters time,

Above two cases will cover 128 samples and will be fair comparison.

More,
if you set max_iters:1 and bs_per_gpu:2 , gpu:1 ==> then 2 images per iter. For each iter, the dataloader will randomly select 2 images.
if you set max_iters:1 and bs_per_gpu:2 , gpu:2 ==> then 4 images per iter. For each iter, the dataloader will randomly select 4 images.

And, images per iter is not related to max_iters.

yeongjae1 · November 15, 2023, 7:57am

4 iters time means logging_interval?

Morganh · November 15, 2023, 8:01am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

No, should be max_iters.

In short, when you are going to compare, please set the same
max_iters * bs_per_gpu * gpu

system · November 29, 2023, 8:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Segformer Batch Size vs Memory Consumption vs Execution time TAO Toolkit	5	12	August 16, 2024
TAO action recogniton net trainning extremely slow TAO Toolkit tao	20	622	August 7, 2023
Training Become very slow Yolov4 TAO Toolkit	25	2070	January 25, 2022
TRT inference on batches is not giving any performance benefit Jetson TX2 tensorrt , nvbugs	11	1155	October 18, 2021
Extremely slow train and evaluation of yolo_v4_tiny TAO Toolkit yolo , tao	12	1224	April 12, 2023
Errors in the training model when batch_size_per_gpu is modified to be greater than 4 TAO Toolkit	5	619	October 12, 2021
[Newbie] how to work multi-gpu in directly docker not tao frontend TAO Toolkit	4	222	November 15, 2023
Memory usage continue growing up when training TAO Toolkit	5	302	July 4, 2023
Evaluation time issue in EfficientDet TAO Toolkit	10	817	March 29, 2022
Training speed issue TAO Toolkit	2	451	November 15, 2022

Could it be that the higher the batchsize, the more it affects the training speed?

Related topics