Description
I have a convolution then batch norm layer in pytorch which uses cudnn in the back end. I convert this layer to a conv + scale layer in tensorrt using the following code:
# in torch, batch norm is
# y = ((x - mean) / sqrt(var + eps)) * weight + bias
# in tensorrt, we have the scale layer that does
# y = (x * scale + shift) ^ power
def exportBatchNorm(bn, input, network, name):
weight = bn.weight.detach().cpu().numpy()
beta = bn.bias.detach().cpu().numpy()
mean = bn.running_mean.detach().cpu().numpy()
var = bn.running_var.detach().cpu().numpy()
scale = np.zeros_like(weight)
shift = np.zeros_like(weight)
power = np.ones_like(weight)
for i, (w, b, m, v) in enumerate(zip(weight, beta, mean, var)):
scale[i] = w / (math.sqrt(v + bn.eps))
shift[i] = b - m * scale[i]
bn = network.add_scale(input=input,
scale=scale,
shift=shift,
power=power,
mode=trt.ScaleMode.CHANNEL)
bn.name = name
return bn.get_output(0)
I then test the two layers by putting the pytorch module in eval mode and doing a forward pass with the same input data for pytorch and tensorrt. I consistently get significantly different values with TensorRT, but only when I have a convolution and then batch norm. Just convolution is fine, and just batch norm is fine.
Environment
TensorRT Version: 6
GPU Type: Titan RTX
Nvidia Driver Version: 470.57.02
CUDA Version: 11.4
CUDNN Version: 8
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.5.0+cu101
Steps To Reproduce
def compare(x, y):
# this needs to be done per channel
channels = x.shape[1]
absdiff = np.zeros(channels)
percent_diff = np.zeros(channels)
for c in range(channels):
input_range = x[:,c,:].max() - x[:,c,:].min()
diff = x[:,c,:] - y[:,c,:]
absdiff[c] = np.abs(diff).max()
percent_diff[c] = absdiff[c] / input_range
return absdiff, percent_diff
model = torch.Sequential(torch.nn.Conv2d(3, 512), torch.nn.BatchNorm2d(512)).cuda().eval()
TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network:
input_tensor = network.add_input(name='input', dtype=trt.float32, shape=input_shape)
output = exportBatchNorm(model, input_tensor, network, 'batch_norm')
network.mark_output(output)
builder.max_workspace_size = int(1e5)
builder.max_batch_size = batch_size
engine = builder.build_cuda_engine(network)
# forward pass with tensorrt and pytorch
trt_out = ...
torch_out = ...
abs_diff, percent_diff = compare(torch_out, trt_out)
The tensorrt fused layers are always pretty significantly different than the pytorch individual layers, I’m sure my forward pass code works because I have tested it with convolutions, activations, and individual batch norms. Conv + batch norm typically has a percent_diff > 0.0001 whereas just convolution or batch normalization has a percent_diff on the order of 1e-7.
As far as I can tell (pytorch/BatchNorm.cpp at master · pytorch/pytorch · GitHub) torch correctly uses cudnn, and cudnn is using the expected algorithm API Reference :: NVIDIA Deep Learning cuDNN Documentation so what could be causing this error?