Cuda Error in createFilterTextureFused

I use TensorRT4.0.0.3 for Tensorflow 1.8, My model is mobilenetv2, My code was written with Tensorpack(A lib for tensorflow).If I donnot convert to tensorrt graph, My model run successfully.But I convert to Tensorrt graph, It shows:

2018-06-26 15:26:54.048618: I tensorflow/contrib/tensorrt/convert/] starting build engine
2018-06-26 15:26:54.637062: I tensorflow/contrib/tensorrt/convert/] Built network
2018-06-26 15:26:54.637735: I tensorflow/contrib/tensorrt/convert/] Serialized engine
2018-06-26 15:26:54.640263: I tensorflow/contrib/tensorrt/convert/] finished engine bottleneck4/block3/depthconv/my_trt_op36 containing 7 nodes
2018-06-26 15:26:54.640542: I tensorflow/contrib/tensorrt/convert/] Finished op preparation
2018-06-26 15:26:54.640788: I tensorflow/contrib/tensorrt/convert/] OK finished op building for bottleneck4/block3/depthconv/my_trt_op36 on device
convert over!!!!!!!!!!!!!!!!!!!!!!!!
2018-06-26 15:26:54.660024: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:04:00.0
totalMemory: 10.91GiB freeMemory: 10.53GiB
2018-06-26 15:26:54.660740: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2018-06-26 15:26:54.660973: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-26 15:26:54.661213: I tensorflow/core/common_runtime/gpu/]      0
2018-06-26 15:26:54.661410: I tensorflow/core/common_runtime/gpu/] 0:   N
2018-06-26 15:26:54.661836: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3351 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1)
2018-06-26 15:26:56.636093: E tensorflow/contrib/tensorrt/log/] DefaultLogger cudnnFusedConvActLayer.cpp (64) - Cuda Error in createFilterTextureFused: 11
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception
Aborted (core dumped)

It shows 37 subgraph convert to tensorrt sucessfully but I get a cuda error and I cannot find any same question with me.
Here is my code to use tensorrt

graph_def = tf.GraphDef()
    with gfile.FastGFile("model.pb",'rb') as f:
    trt_graph = trt.create_inference_graph(graph_def, OUTPUT_NAMES,
                                         precision_mode="FP32")  # Get optimized graph
    print("convert over!!!!!!!!!!!!!!!!!!!!!!!!")
    g = tf.Graph()
    with g.as_default():
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.30)
        with tf.Session(graph=g, config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
            datasets =, (tf.uint8),
                                                    (tf.TensorShape([1,cfg.img_size, cfg.img_size,3])))
            out = tf.import_graph_def(
            for r in out:

Then I use it with call