I am getting this same issue and my accuracy is affected. How should I got about handling this? I have another computer with a similiar setup (slightly different versions of CUDA, cuDNN, etc). And it does not raise this warning. It also gives different results when I run both torch_tensorrt models.
Also, when training my model, I do apply L2 regularization to all of my weights.
Here is a sample of my warnings:
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Weights [name=%1030 : Tensor = aten::_convolution(%result.41, %self.res_18.conv2.weight, %self.conv.conv1.bias, %5, %5, %5, %1028, %1029, %7, %1028, %1028, %1028, %1028) + %out.13 : Tensor = aten::batch_norm(%1030, %self.res_18.bn2.weight, %self.res_18.bn2.bias, %self.res_18.bn2.running_mean, %self.res_18.bn2.running_var, %self.conv.bn1.training, %549, %550, %551) # /home/user/venv/lib/python3.8/site-packages/torch/nn/functional.py:2282:11 + %892 : Tensor = aten::add(%out.13, %s.77, %7) # /home/user/Net.py:89:8 + %s.81 : Tensor = aten::relu(%892) # /home/user/venv/lib/python3.8/site-packages/torch/nn/functional.py:1299:17.weight] had the following issues when converted to FP16:
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - Subnormal FP16 values detected.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - Values less than smallest positive FP16 Subnormal value detected. Converting to FP16 minimum subnormalized value.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Weights [name=%1036 : Tensor = aten::_convolution(%s.81, %self.outblock.policy_conv1.weight, %self.conv.conv1.bias, %5, %4, %5, %1034, %1035, %7, %1034, %1034, %1034, %1034) + %902 : Tensor = aten::batch_norm(%1036, %self.outblock.policy_bn1.weight, %self.outblock.policy_bn1.bias, %self.outblock.policy_bn1.running_mean, %self.outblock.policy_bn1.running_var, %self.conv.bn1.training, %549, %550, %551) # /home/user/venv/lib/python3.8/site-packages/torch/nn/functional.py:2282:11 + %result.3 : Tensor = aten::relu(%902) # /home/user/venv/lib/python3.8/site-packages/torch/nn/functional.py:1299:17 || %1033 : Tensor = aten::_convolution(%s.81, %self.outblock.value_conv.weight, %self.conv.conv1.bias, %5, %4, %5, %1031, %1032, %7, %1031, %1031, %1031, %1031) + %895 : Tensor = aten::batch_norm(%1033, %self.outblock.value_bn.weight, %self.outblock.value_bn.bias, %self.outblock.value_bn.running_mean, %self.outblock.value_bn.running_var, %self.conv.bn1.training, %549, %550, %551) # /home/user/venv/lib/python3.8/site-packages/torch/nn/functional.py:2282:11 + %result.4 : Tensor = aten::relu(%895) # /home/user/venv/lib/python3.8/site-packages/torch/nn/functional.py:1299:17.weight] had the following issues when converted to FP16:
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - Subnormal FP16 values detected.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Weights [name=%1039 : Tensor = aten::_convolution(%result.3, %self.outblock.policy_conv2.weight, %self.outblock.policy_conv2.bias, %5, %4, %5, %1037, %1038, %7, %1037, %1037, %1037, %1037).weight] had the following issues when converted to FP16:
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - Subnormal FP16 values detected.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Weights [name=%1039 : Tensor = aten::_convolution(%result.3, %self.outblock.policy_conv2.weight, %self.outblock.policy_conv2.bias, %5, %4, %5, %1037, %1038, %7, %1037, %1037, %1037, %1037).bias] had the following issues when converted to FP16:
WARNING: [Torch-TensorRT TorchScript Conversion Context] - - Subnormal FP16 values detected.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT] - The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
Update:
I found that the two machines had differnet versions of CUDA/cudnn/tensorrt.
When I reverted to the older machines 11.4 CUDA (cudnn 8.2 and tensorrt 8.2) then was the same results.
So it seems that CUDA 11.7 vs 11.4 (or somethign in between) is causing this warning which in turn is giving different results in my model.