TensorRT batch Error

hi, I use tensorRT library. when batchSize is 1, there is no problem. But when batchSize is bigger than 1, there is a error “cudnnEngine.cpp (449) - Cuda Error in execute: 77”. So I delete my model, I use cuda-memcheck to find the problem. cuda-memcheck log informat is:
========= Invalid global read of size 4
========= at 0x000004a8 in trt_maxwell_scudnn_128x64_relu_medium_nn_v1
========= by thread (63,0,0) in block (265,0,0)
========= Address 0x7fd75646107c is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x22b40d]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.9.0 [0x153dd]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.9.0 [0x15467]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.9.0 (cudaLaunchKernel + 0x1f0) [0x470f0]
========= Host Frame:/usr/local/TensorRT/lib/libnvinfer.so.4 (_ZN4cask18ImplicitGemmShaderINS_18ImplicitGemmParamsILi8ELi256EEEE3runERKNS_11ConvolutionEmPvmS7_iS7_PKvS9_P11CUstream_st + 0x28f) [0x72df2f]
========= Host Frame:/usr/local/TensorRT/lib/libnvinfer.so.4 (_ZN8nvinfer14task20caskConvolutionLayer7executeERKNS_5cudnn13CommonContextERKNS2_19ExecutionParametersE + 0x1c3) [0x3df703]
========= Host Frame:/usr/local/TensorRT/lib/libnvinfer.so.4 (_ZN8nvinfer15cudnn16ExecutionContext7executeEiPPv + 0x2d7) [0x3d0aa7]
========= Host Frame:/home/lucas/ksycode/PC/yidianzixun/bin/ksyAI.so [0x61a3]
========= Host Frame:/home/lucas/ksycode/PC/yidianzixun/bin/ksyAI.so (run_graph + 0x29) [0x68d9]
========= Host Frame:test [0x1b81]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21a87]
========= Host Frame:test [0x267a]

========= Invalid global read of size 4
========= at 0x000004a8 in trt_maxwell_scudnn_128x64_relu_medium_nn_v1
========= by thread (62,0,0) in block (265,0,0)
========= Address 0x7fd756461078 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x22b40d]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.9.0 [0x153dd]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.9.0 [0x15467]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.9.0 (cudaLaunchKernel + 0x1f0) [0x470f0]
========= Host Frame:/usr/local/TensorRT/lib/libnvinfer.so.4 (_ZN4cask18ImplicitGemmShaderINS_18ImplicitGemmParamsILi8ELi256EEEE3runERKNS_11ConvolutionEmPvmS7_iS7_PKvS9_P11CUstream_st + 0x28f) [0x72df2f]
========= Host Frame:/usr/local/TensorRT/lib/libnvinfer.so.4 (_ZN8nvinfer14task20caskConvolutionLayer7executeERKNS_5cudnn13CommonContextERKNS2_19ExecutionParametersE + 0x1c3) [0x3df703]
========= Host Frame:/usr/local/TensorRT/lib/libnvinfer.so.4 (_ZN8nvinfer15cudnn16ExecutionContext7executeEiPPv + 0x2d7) [0x3d0aa7]
========= Host Frame:/home/lucas/ksycode/PC/yidianzixun/bin/ksyAI.so [0x61a3]
========= Host Frame:/home/lucas/ksycode/PC/yidianzixun/bin/ksyAI.so (run_graph + 0x29) [0x68d9]
========= Host Frame:test [0x1b81]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21a87]
========= Host Frame:test [0x267a]

Can anyone help me ? thanks.

Hello, what version of TensorRT are you using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version