Unable accurately convert cudnn conv + batch norm to tensorrt conv + scale layer


I have a convolution then batch norm layer in pytorch which uses cudnn in the back end. I convert this layer to a conv + scale layer in tensorrt using the following code:

# in torch, batch norm is 
# y = ((x - mean) / sqrt(var + eps)) * weight + bias
# in tensorrt, we have the scale layer that does
# y = (x * scale + shift) ^ power
def exportBatchNorm(bn, input, network, name):
    weight = bn.weight.detach().cpu().numpy()
    beta = bn.bias.detach().cpu().numpy()
    mean = bn.running_mean.detach().cpu().numpy()
    var = bn.running_var.detach().cpu().numpy()
    scale = np.zeros_like(weight)
    shift = np.zeros_like(weight)
    power = np.ones_like(weight)
    for i, (w, b, m, v) in enumerate(zip(weight, beta, mean, var)):
        scale[i] = w / (math.sqrt(v + bn.eps))
        shift[i] = b - m * scale[i]
    bn = network.add_scale(input=input,
    bn.name = name
    return bn.get_output(0)

I then test the two layers by putting the pytorch module in eval mode and doing a forward pass with the same input data for pytorch and tensorrt. I consistently get significantly different values with TensorRT, but only when I have a convolution and then batch norm. Just convolution is fine, and just batch norm is fine.


TensorRT Version: 6
GPU Type: Titan RTX
Nvidia Driver Version: 470.57.02
CUDA Version: 11.4
CUDNN Version: 8
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.5.0+cu101

Steps To Reproduce

def compare(x, y):
    # this needs to be done per channel
    channels = x.shape[1]
    absdiff = np.zeros(channels)
    percent_diff = np.zeros(channels)
    for c in range(channels):
        input_range = x[:,c,:].max() - x[:,c,:].min()
        diff = x[:,c,:] - y[:,c,:]
        absdiff[c] = np.abs(diff).max()
        percent_diff[c] = absdiff[c] / input_range
    return absdiff, percent_diff

model = torch.Sequential(torch.nn.Conv2d(3, 512), torch.nn.BatchNorm2d(512)).cuda().eval()

TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network:
    input_tensor = network.add_input(name='input', dtype=trt.float32, shape=input_shape)
    output = exportBatchNorm(model, input_tensor, network, 'batch_norm')
    builder.max_workspace_size = int(1e5)
    builder.max_batch_size = batch_size
    engine = builder.build_cuda_engine(network)
    # forward pass with tensorrt and pytorch
    trt_out = ...
    torch_out = ...

    abs_diff, percent_diff = compare(torch_out, trt_out)

The tensorrt fused layers are always pretty significantly different than the pytorch individual layers, I’m sure my forward pass code works because I have tested it with convolutions, activations, and individual batch norms. Conv + batch norm typically has a percent_diff > 0.0001 whereas just convolution or batch normalization has a percent_diff on the order of 1e-7.

As far as I can tell (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/BatchNorm.cpp#L218) torch correctly uses cudnn, and cudnn is using the expected algorithm API Reference :: NVIDIA Deep Learning cuDNN Documentation so what could be causing this error?

Hi , UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser.
Please check the below link for the same.



I am using neither, as can be seen above, I do not export to onnx. Instead I directly use the TensorRT python API and am manually setting the weights of the scale layer. Surely the TensorRT python API is still supported?

I’ve updated the problem statement to indicate that this error is only when I have conv + batch norm fusion, as individual convolutions and batch normalization both work correctly.
I have used onnx but there is significant degradation in our performance when using it, and thus we are testing pytorch directly to tensorrt.

Hi @dtmoodie,

Are you facing the same issue with TensorRT latest version as well ?
Please try latest TRT version.

Thank you.