Support multiple GPUs for QAT YOLOv7

johnminho · July 25, 2023, 7:05am

I refer to this guide to do QAT Yolov7 https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat . It support only 1 GPU. Do you have plan to support multiple GPUs for QAT finetune training?

spolisetty · July 25, 2023, 6:04pm

Hi,

Please refer to the official repo GitHub - NVIDIA-AI-IOT/yolo_deepstream: yolo model qat and deploy with deepstream&tensorrt for the latest updates.
We are moving this post to the Deepstream forum to get better help.

Thank you.

haowang · July 27, 2023, 6:07am

As far as I know, pytorch quantization Tool already support multi-GPU training.
You can ref that to do multi GPU training.
For now, we do not have plan to update multiGPU support on this sample as this sample only demostrate how to use pytorch-quantization tool( https://github.com/NVIDIA/TensorRT/tree/release/8.6/tools/pytorch-quantization) , and how to placement QDQ nodes to get best performance.

johnminho · July 28, 2023, 7:38am

@haowang
Thanks.
I try using nn.DataParallel but I got an error with this line of code https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/5af35bab7f6dfca7f1f32d44847b2a91786485f4/yolov7_qat/quantization/quantize.py#L344

File "test_qat.py", line 195, in <module>
    model(imgs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/GSOL_lossless_AI/yolov7_multiple_GPUs/models/yolo.py", line 599, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "/GSOL_lossless_AI/yolov7_multiple_GPUs/models/yolo.py", line 625, in forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1120, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/GSOL_lossless_AI/yolov7_multiple_GPUs/models/common.py", line 111, in fuseforward
    return self.act(self.conv(x))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/quant_conv.py", line 120, in forward
    quant_input, quant_weight = self._quant(input)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/quant_conv.py", line 85, in _quant
    quant_input = self._input_quantizer(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/tensor_quantizer.py", line 346, in forward
    outputs = self._quant_forward(inputs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/tensor_quantizer.py", line 310, in _quant_forward
    outputs = fake_tensor_quant(inputs, amax, self._num_bits, self._unsigned, self._narrow_range)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/tensor_quant.py", line 306, in forward
    outputs, scale = _tensor_quant(inputs, amax, num_bits, unsigned, narrow_range)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/tensor_quant.py", line 354, in _tensor_quant
    outputs = torch.clamp((inputs * scale).round_(), min_bound, max_bound)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Topic		Replies	Views
Fps drop in multiple nvinfer amd adding queue between pgie and sgiede DeepStream SDK deepstream	6	21	March 11, 2025
"scratch TensorRT API network + non-supported layer plugin" is not working in deepstream sdk 5.0 DeepStream SDK	4	1198	October 12, 2021
Error with pytorch model without BN fusing when running QAT? DeepStream SDK	5	588	September 5, 2023
Deepstream 4 + yolov3 multi source slow DeepStream SDK	9	1816	October 12, 2021
Instructions to integrate TAO 3.0 YoloV4 model into DeepStream produce no output on Jetson NX DeepStream SDK	10	387	December 5, 2023
TLT-deepstream sample app problems : I found thatFRCNN, SSD , DSSD , RetinaNet and Detectnet_v2 can run successfully, but Yolov3 can’t TAO Toolkit tensorrt	22	1411	October 12, 2021
Training using Yolov3 on custom dataset General	0	870	September 7, 2020
AssertionError: The number of GPUs ([1]) must be the same as the number of GPU indices (4) provided TAO Toolkit	22	2536	October 12, 2021
Yolo for deepstream-app DeepStream SDK	27	9104	October 12, 2021
DeepStream / Triton-Server and YOLOv7 QAT DeepStream SDK	2	442	February 18, 2024

Support multiple GPUs for QAT YOLOv7

Related topics