Inconsistent result when batch size larger than 1 on V100

Description

We have a CNN service which use tensorrt7.1 GA to server models, on 1080Ti, P100, P4, 2080Ti, it works all right, and already used for several months . But recently, we tested it on V100, the result is inconsistent when batchsize is larger than 1. Is their any known issues on V100?

Environment

TensorRT Version: 7.1 GA
GPU Type: V100
Nvidia Driver Version: 440.82
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Ubuntu 16.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

–>

test it on 7.2, it’s ok