cudnnFusedConvActLayer.cpp (64) - Cuda Error in createFilterTextureFused: 11

I use TensorRT4.0.0.3 for Tensorflow 1.8, My model is mobilenetv2, When I use TensorRT, I meet this error

2018-06-26 15:26:54.048618: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2612] starting build engine
2018-06-26 15:26:54.637062: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2617] Built network
2018-06-26 15:26:54.637735: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2622] Serialized engine
2018-06-26 15:26:54.640263: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2630] finished engine bottleneck4/block3/depthconv/my_trt_op36 containing 7 nodes
2018-06-26 15:26:54.640542: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2637] Finished op preparation
2018-06-26 15:26:54.640788: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2646] OK finished op building for bottleneck4/block3/depthconv/my_trt_op36 on device
convert over!!!!!!!!!!!!!!!!!!!!!!!!
2018-06-26 15:26:54.660024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1378] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:04:00.0
totalMemory: 10.91GiB freeMemory: 10.53GiB
2018-06-26 15:26:54.660740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1457] Adding visible gpu devices: 0
2018-06-26 15:26:54.660973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-26 15:26:54.661213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:944]      0
2018-06-26 15:26:54.661410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:957] 0:   N
2018-06-26 15:26:54.661836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3351 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1)
2018-06-26 15:26:56.636093: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger cudnnFusedConvActLayer.cpp (64) - Cuda Error in createFilterTextureFused: 11
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception
Aborted (core dumped)

here is my code

graph_def = tf.GraphDef()
    with gfile.FastGFile("model.pb",'rb') as f:
        graph_def.ParseFromString(f.read())
    trt_graph=graph_def
    print('Convert_to_trt')
    trt_graph = trt.create_inference_graph(graph_def, OUTPUT_NAMES,
                                         max_batch_size=1,
                                         max_workspace_size_bytes=7000000000,
                                         precision_mode="FP32")  # Get optimized graph
    print("convert over!!!!!!!!!!!!!!!!!!!!!!!!")
    #tf.reset_default_graph()
    g = tf.Graph()
    with g.as_default():
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.30)
        with tf.Session(graph=g, config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
            datasets = tf.data.Dataset.from_generator(det_getData, (tf.uint8),
                                                    (tf.TensorShape([1,cfg.img_size, cfg.img_size,3])))
            iterator=datasets.make_one_shot_iterator()
            next_element=iterator.get_next()
            outlist=[]
            out = tf.import_graph_def(
                                    graph_def=trt_graph,
                                    input_map={'Placeholder':next_element},
                                    return_elements=OUTPUT_NAMES)
            for r in out:
                outlist.append(r.outputs[0])

If I donnot use tensorrt, this code can run and get output.

Hi @thssljj, may I know how you get the graph? I’m trying to run this as well, would you please provide the full source code including: create/import the graph, do the conversion to trt, and run it?
Thanks.

I change the max batch size to 4 and it run successfully. I am confused.