tensorrt for caffe-yolov3 optimization failed

sampleIT8 demo in tensorrt package for caffe-yolov3 optimaztion works fine in FP32 mode. However, the INT8 calibration always break down, the INT8 optimization can not be achieved. the error reported as follows:
[W] [TRT] TensorRT was compiled against cuDNN 7.5.0 but is linked against cuDNN 7.3.1. This mismatch may potentially cause undefined behavior.
[I] Top1: 0, Top5: 0
[I] Processing 4 images averaged 19.0239 ms/image and 19.0239 ms/batch.
[I] FP16 run:4 batches of size 1 starting at 1
[I] Spcified precision is not natively support
[E] [TRT] engine.cpp (570) - Cuda Error in commonEmitTensor: 11 (invalid argument)
[E] [TRT] Failure while trying to emit debug blob.
engine.cpp (570) - Cuda Error in commonEmitTensor: 11 (invalid argument)
[E] [TRT] cuda/customWinogradConvActLayer.cpp (342) - Cuda Error in execute: 11 (invalid argument)
[E] [TRT] cuda/customWinogradConvActLayer.cpp (342) - Cuda Error in execute: 11 (invalid argument)

could you provide me with calibrationtable file for caffe_yolov3.
the configuration of my computer is :
cuda 10.0
cudnn 7.3.1
tensorrt 5.1.2.2
GPU: P4
driver:410

when I try to use googlenet for int8 optimization, it also gives the error.

@root0-W780-G20:~/software/TensorRT-5.0.2.6/bin$ ./sample_int8 googlenet

FP32 run:1 batches of size 1 starting at 20
pass one
jell

Top1: 0, Top5: 0
Processing 1 images averaged 1.8903 ms/image and 1.8903 ms/batch.

FP16 run:1 batches of size 1 starting at 20
Engine could not be created at this precision

INT8 run:1 batches of size 1 starting at 20
ERROR: engine.cpp (404) - Cuda Error in commonEmitTensor: 11
ERROR: Failure while trying to emit debug blob.
engine.cpp (404) - Cuda Error in commonEmitTensor: 11
ERROR: cuda/customWinogradConvActLayer.cpp (319) - Cuda Error in execute: 11
ERROR: cuda/customWinogradConvActLayer.cpp (319) - Cuda Error in execute: 11
Cuda failure: 77Aborted (core dumped)

Moving this thread to the TensorRT forum.