Some layers of onnx are discarded directly, when pytorch onnx convert egine file on FP16

Description

Environment

TensorRT Version: 7.0.0-1
GPU Type: Tesla T4
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7.6.5.32-1
Operating System + Version: Ubuntu 7.5.0-3ubuntu1~18.04
Python Version (if applicable): 3.7.7
PyTorch Version (if applicable): 1.2

Steps To Reproduce

  1. generate onnx
    `import torch
    from torchvision.models.shufflenetv2 import *
    import sys
    class Model1(torch.nn.Module):

    def init(self):
    super(Model1, self).init()
    self.conv1 = torch.nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
    def forward(self, x):
    x = self.conv1(x)
    groups = 2
    batchsize, num_channels, height, width = x.data.size()
    channels_per_group = num_channels // groups

     # reshape
     x = x.view(batchsize, groups, channels_per_group, height, width)
    
     print(x)
     x = torch.transpose(x, 1, 2).contiguous()
     print(x)
     # flatten
     x = x.view(batchsize, -1, height, width)
    
     return x
    

    output_filename = sys.argv[1]
    model = Model1()
    dummy_input = torch.randn(1, 3, 224, 224)
    torch.onnx.export(model, dummy_input, output_filename, verbose=False, opset_version=10)
    `

  2. get info of layout
    FP16
    trtexec --onnx=shuffle.onnx --fp16 --verbose
    FP32
    trtexec --onnx=shuffle.onnx --verbose

  3. compare layer info
    FP32
    [V] [TRT] Engine Layer Information: [V] [TRT] Layer(scudnn): (Unnamed Layer* 0) [Convolution], Tactic: -4420849921117327522, input[Float(3,224,224)] -> 3[Float(16,224,224)] [V] [TRT] Layer(Shuffle): (Unnamed Layer* 1) [Shuffle] + (Unnamed Layer* 2) [Shuffle], Tactic: 0, 3[Float(16,224,224)] -> 6[Float(8,2,224,224)] [V] [TRT] Layer(Shuffle): (Unnamed Layer* 3) [Shuffle], Tactic: 0, 6[Float(8,2,224,224)] -> 8[Float(16,224,224)]
    FP16
    `[V] [TRT] Engine Layer Information:
    [V] [TRT] Layer(Reformat): (Unnamed Layer* 0) [Convolution] input reformatter 0, Tactic: 0, input[Float(3,224,224)] -> (Unnamed Layer* 0) [Convolution] reformatted input 0[Half(3,224,224)]
    [V] [TRT] Layer(hcudnn): (Unnamed Layer* 0) [Convolution], Tactic: 4930470141256631146, (Unnamed Layer* 0) [Convolution] reformatted input 0[Half(3,224,224)] -> 3[Half(16,224,224)]
    [V] [TRT] Layer(Reformat): (Unnamed Layer* 3) [Shuffle] input reformatter 0, Tactic: 0, 6[Half(8,2,224,224)] -> (Unnamed Layer* 3) [Shuffle] reformatted input 0[Float(8,2,224,224)]
    [V] [TRT] Layer(Shuffle): (Unnamed Layer* 3) [Shuffle], Tactic: 0, (Unnamed Layer* 3) [Shuffle] reformatted input 0[Float(8,2,224,224)] -> 8[Float(16,224,224)]

    layout 1 and layout 2 of shuffle.onnx ared discarded ditectly on FP16
    `

4. F32 and FP16 Inference results are quite different

Issue is fixed and should be available in next TRT release.
Request you to please stay tuned for TRT release announcement.

Thanks