I refer to this guide to do QAT Yolov7 https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat . It support only 1 GPU. Do you have plan to support multiple GPUs for QAT finetune training?
Hi,
Please refer to the official repo GitHub - NVIDIA-AI-IOT/yolo_deepstream: yolo model qat and deploy with deepstream&tensorrt for the latest updates.
We are moving this post to the Deepstream forum to get better help.
Thank you.
As far as I know, pytorch quantization Tool already support multi-GPU training.
You can ref that to do multi GPU training.
For now, we do not have plan to update multiGPU support on this sample as this sample only demostrate how to use pytorch-quantization tool( https://github.com/NVIDIA/TensorRT/tree/release/8.6/tools/pytorch-quantization) , and how to placement QDQ nodes to get best performance.
@haowang
Thanks.
I try using nn.DataParallel
but I got an error with this line of code https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/5af35bab7f6dfca7f1f32d44847b2a91786485f4/yolov7_qat/quantization/quantize.py#L344
File "test_qat.py", line 195, in <module>
model(imgs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/GSOL_lossless_AI/yolov7_multiple_GPUs/models/yolo.py", line 599, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/GSOL_lossless_AI/yolov7_multiple_GPUs/models/yolo.py", line 625, in forward_once
x = m(x) # run
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1120, in _call_impl
result = forward_call(*input, **kwargs)
File "/GSOL_lossless_AI/yolov7_multiple_GPUs/models/common.py", line 111, in fuseforward
return self.act(self.conv(x))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/quant_conv.py", line 120, in forward
quant_input, quant_weight = self._quant(input)
File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/quant_conv.py", line 85, in _quant
quant_input = self._input_quantizer(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/tensor_quantizer.py", line 346, in forward
outputs = self._quant_forward(inputs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/nn/modules/tensor_quantizer.py", line 310, in _quant_forward
outputs = fake_tensor_quant(inputs, amax, self._num_bits, self._unsigned, self._narrow_range)
File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/tensor_quant.py", line 306, in forward
outputs, scale = _tensor_quant(inputs, amax, num_bits, unsigned, narrow_range)
File "/usr/local/lib/python3.6/dist-packages/pytorch_quantization/tensor_quant.py", line 354, in _tensor_quant
outputs = torch.clamp((inputs * scale).round_(), min_bound, max_bound)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!