I have a convolution then batch norm layer in pytorch which uses cudnn in the back end. I convert this layer to a conv + scale layer in tensorrt using the following code:
# in torch, batch norm is # y = ((x - mean) / sqrt(var + eps)) * weight + bias # in tensorrt, we have the scale layer that does # y = (x * scale + shift) ^ power def exportBatchNorm(bn, input, network, name): weight = bn.weight.detach().cpu().numpy() beta = bn.bias.detach().cpu().numpy() mean = bn.running_mean.detach().cpu().numpy() var = bn.running_var.detach().cpu().numpy() scale = np.zeros_like(weight) shift = np.zeros_like(weight) power = np.ones_like(weight) for i, (w, b, m, v) in enumerate(zip(weight, beta, mean, var)): scale[i] = w / (math.sqrt(v + bn.eps)) shift[i] = b - m * scale[i] bn = network.add_scale(input=input, scale=scale, shift=shift, power=power, mode=trt.ScaleMode.CHANNEL) bn.name = name return bn.get_output(0)
I then test the two layers by putting the pytorch module in eval mode and doing a forward pass with the same input data for pytorch and tensorrt. I consistently get significantly different values with TensorRT, but only when I have a convolution and then batch norm. Just convolution is fine, and just batch norm is fine.
TensorRT Version: 6
GPU Type: Titan RTX
Nvidia Driver Version: 470.57.02
CUDA Version: 11.4
CUDNN Version: 8
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.5.0+cu101
def compare(x, y): # this needs to be done per channel channels = x.shape absdiff = np.zeros(channels) percent_diff = np.zeros(channels) for c in range(channels): input_range = x[:,c,:].max() - x[:,c,:].min() diff = x[:,c,:] - y[:,c,:] absdiff[c] = np.abs(diff).max() percent_diff[c] = absdiff[c] / input_range return absdiff, percent_diff model = torch.Sequential(torch.nn.Conv2d(3, 512), torch.nn.BatchNorm2d(512)).cuda().eval() TRT_LOGGER = trt.Logger(trt.Logger.ERROR) with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network: input_tensor = network.add_input(name='input', dtype=trt.float32, shape=input_shape) output = exportBatchNorm(model, input_tensor, network, 'batch_norm') network.mark_output(output) builder.max_workspace_size = int(1e5) builder.max_batch_size = batch_size engine = builder.build_cuda_engine(network) # forward pass with tensorrt and pytorch trt_out = ... torch_out = ... abs_diff, percent_diff = compare(torch_out, trt_out)
The tensorrt fused layers are always pretty significantly different than the pytorch individual layers, I’m sure my forward pass code works because I have tested it with convolutions, activations, and individual batch norms. Conv + batch norm typically has a percent_diff > 0.0001 whereas just convolution or batch normalization has a percent_diff on the order of 1e-7.
As far as I can tell (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/BatchNorm.cpp#L218) torch correctly uses cudnn, and cudnn is using the expected algorithm API Reference :: NVIDIA Deep Learning cuDNN Documentation so what could be causing this error?