Hi,
I am trying to convert a ssd_mobilenet_v2_coco model to a tensorRT model on jetson Nano. I have trained the model with just one class on my laptop that has this specification:
CPU: Intel i7-8750H @ 2.2GHz x12, 8GB RAM
GPU: nvidia Quadro P600, 4GB
I can run the inference on my laptop at around 13Hz but with it takes around 70/80 seconds to run on Jetson Nano. I have Jetpack 4.2 on it along with tensorflow 1.14.0+nv19.10 installed as per nvidia guidelines. Strangely if I use the cpu only (by setting os.environ[“CUDA_AVAILABLE_DEVICES”]=‘-1’, inference time is around 3 seconds.
I am now trying to convert the frozen graph to TF_TRT model using TrtConverter as shown Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation, however, if i use gpu_option “allow_soft_placement=True”, the speed is very bad again and with “allow_soft_placement=False”, the code stops with the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation Preprocessor/map/TensorArray_2: Could not satisfy explicit device specification ‘/device:GPU:0’ because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_=‘/device:GPU:0’ assigned_device_name_=‘’ resource_device_name_=‘’ supported_device_types_=[CPU, XLA_CPU, XLA_GPU] possible_devices_=
TensorArrayGatherV3: GPU CPU XLA_CPU XLA_GPU
Enter: GPU CPU XLA_CPU XLA_GPU
TensorArrayV3: CPU XLA_CPU XLA_GPU
TensorArrayWriteV3: CPU XLA_CPU XLA_GPU
TensorArraySizeV3: GPU CPU XLA_CPU XLA_GPU
Const: GPU CPU XLA_CPU XLA_GPU
Range: GPU CPU XLA_CPU XLA_GPU
Colocation members, user-requested devices, and framework assigned devices, if any:
Preprocessor/map/TensorArray_2 (TensorArrayV3) /device:GPU:0
Preprocessor/map/while/ResizeImage/stack_1 (Const) /device:GPU:0
Preprocessor/map/while/TensorArrayWrite_1/TensorArrayWriteV3/Enter (Enter) /device:GPU:0
Preprocessor/map/while/TensorArrayWrite_1/TensorArrayWriteV3 (TensorArrayWriteV3) /device:GPU:0
Preprocessor/map/TensorArrayStack_1/TensorArraySizeV3 (TensorArraySizeV3) /device:GPU:0
Preprocessor/map/TensorArrayStack_1/range/start (Const) /device:GPU:0
Preprocessor/map/TensorArrayStack_1/range/delta (Const) /device:GPU:0
Preprocessor/map/TensorArrayStack_1/range (Range) /device:GPU:0
Preprocessor/map/TensorArrayStack_1/TensorArrayGatherV3 (TensorArrayGatherV3) /device:GPU:0
Op: TensorArrayV3
Node attrs: element_shape=, dynamic_size=false, clear_after_read=true, identical_element_shapes=true, tensor_array_name=“”, dtype=DT_INT32
Registered kernels:
device=‘XLA_GPU’; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, …, DT_QINT32, DT_BFLOAT16, DT_HALF, DT_UINT32, DT_UINT64]
device=‘XLA_CPU’; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, …, DT_BFLOAT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
device=‘XLA_CPU_JIT’; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, …, DT_BFLOAT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
device=‘XLA_GPU_JIT’; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT8, …, DT_QINT32, DT_BFLOAT16, DT_HALF, DT_UINT32, DT_UINT64]
device=‘GPU’; dtype in [DT_BFLOAT16]
device=‘GPU’; dtype in [DT_INT64]
device=‘GPU’; dtype in [DT_COMPLEX128]
device=‘GPU’; dtype in [DT_COMPLEX64]
device=‘GPU’; dtype in [DT_DOUBLE]
device=‘GPU’; dtype in [DT_FLOAT]
device=‘GPU’; dtype in [DT_HALF]
device=‘CPU’
[[{{node Preprocessor/map/TensorArray_2}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “fulldetect_test_with_trt_R2.py”, line 231, in
solve.detect_quad()
File “fulldetect_test_with_trt_R2.py”, line 214, in detect_quad
self.process_image_and_plot(img, category_index)
File “fulldetect_test_with_trt_R2.py”, line 124, in process_image_and_plot
(boxes, scores, classes, num_detections) = self.session.run([boxes, scores, classes, num_detections], feed_dict={image_tensor: image_np_expanded})
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 950, in run
run_metadata_ptr)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1173, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1350, in _do_run
run_metadata)
Probably I am making some error in converting the model. I have also tried to use create_inference_graph instead of TrtConverter but it has the same outcome. I have noticed that with create_inference_graph, the optimized graph is bigger than the original trained frozen graph.
Can you please let me know how I can overcome these issues or point towards right places to look. I am trying random few solutions that I get online but none seem to be working. Do I need to do anything while training or converting trained model to frozen graph in order to use the graph on Jetson Nano?
Let me know if any other info is needed.
Thanks.