running trt.create_inference_graph causes system lockup and kernel restart

bigrobinson · November 14, 2018, 5:11pm

Provide details on the platforms you are using:
OS : Ubuntu 18.04
GPU : GTX 1080ti
nvidia driver version : 390.87
CUDA version : 9.0
CUDNN version : 7.3.1
Python version : 2.7.15rc1
Tensorflow version : tf-gpu 1.10.0
TensorRT version : 5.0.2.6
IDE: Spyder

Describe the problem:

I have trained the DGCNN classifier model (https://github.com/WangYueFt/dgcnn) and verified the tf graph is successfully producing inferences. I have frozen the graph and saved it as a .pb binary. Upon running trt.create_inference_graph it complains that is was compiled against TRT 3.0.4 but loaded TRT 4.0.1 and that it may not work, though only TRT 5 is installed, included in PATH and LD_LIBRARY_PATH, and pip installed. After this, my machine locks up. Eventually IPython console blacks out and the python kernel restarts. My code for freezing and converting to TRT is included below. Unfortunately I can’t include the exact console output, because of the hanging issue described above, but the above summary captures it all.

My first question is, what are the actual requirements for compatibility across cuda toolkit, tensorflow, tensorrt, and OS? There is conflicting guidance out there. I am following the compatibility matrix in “Accelerating Inference In TensorFlow With TensorRT User Guide”, though I can’t find a download for TRT 5.0.0rc in particular.

def save_trt_graph(export_dir):
tf.reset_default_graph()
G = tf.Graph()

with G.as_default():
    pointclouds_pl, labels_pl = MODEL.placeholder_inputs(1, NUM_POINT)
    is_training_pl = tf.placeholder(tf.bool, shape=())
    # simple model
    pred, end_points = MODEL.get_model(pointclouds_pl, is_training_pl)
    
    config = tf.ConfigProto(allow_soft_placement = True)
    
   # Create Session
    with tf.Session(graph=G, config=config) as sess:    
        with G.device('gpu:0'):
            sess.run(tf.global_variables_initializer())
            # This one imports model from checkpoint metagraph
            saver = tf.train.import_meta_graph(MODEL_PATH+'.meta')
            # Restore variables from checkpoint.
            saver.restore(sess, MODEL_PATH)
        
            # fix nodes
            gd = sess.graph.as_graph_def() 
            for node in gd.node:
                if node.op == 'RefSwitch':
                    node.op = 'Switch'
                    #for index in xrange(len(node.input)):
                    #    node.input[index] = node.input[index] + '/read'
                elif node.op == 'AssignSub':
                    node.op = 'Sub'
                    if 'use_locking' in node.attr: del node.attr['use_locking']
        
            log_string("\nModel restored.\n")
            frozen_graph = tf.graph_util.convert_variables_to_constants(
                    sess,
                    gd,
                    output_node_names=['fc3/output'])
        
            trt_output_graph_def = trt.create_inference_graph(
                    input_graph_def=frozen_graph,
                    outputs = ['fc3/output'],
                    max_batch_size=1,
                    max_workspace_size_bytes=1<<30,
                    precision_mode="FP32")

NVES · November 17, 2018, 11:21pm

Hello,

regarding TensorRT 5RC (release candidate), it has been replaced by TensorRT 5GA (General Availability). Please use it as the latest version. I also recommend using NVIDIA TensorRT container, which removes many of the dependency issues you may be seeing. It’s available here: https://www.nvidia.com/en-us/gpu-cloud/ the account is free.

Topic		Replies	Views
Where is the create_inference_graph function in TensorRT-5 TensorRT	8	3203	October 12, 2021
TF-TRT not generating .engine file TensorRT	1	726	May 18, 2022
TensorRT Error: Can't identify the cuda device. Running on device 0 TensorRT tensorrt , cuda , tensorflow	3	654	January 7, 2021
WARNING:tensorflow:TensorRT mismatch TensorRT	3	1305	January 17, 2020
Running trt.create_inference_graph, kernel restarting TensorRT kernel	3	686	October 6, 2021
create_inference_graph returns empty graph def?? TensorRT	1	895	March 21, 2019
Unable to generate TensorRT graph on RTX 2080 Ti TensorRT	1	892	November 18, 2019
create_inference_graph error TensorRT	6	3412	October 12, 2021
Couldn't get current device: unknown error TensorRT tensorrt , cuda , tensorflow	1	827	January 15, 2021
TensorRT 5 and TensorRT 7 conversion discrepancy TensorRT tensorrt , tensorflow	4	508	September 23, 2020

running trt.create_inference_graph causes system lockup and kernel restart

Related topics