I’ve created a complete tutorial about how to train your custom model (a hand detector) and deploy the model with ‘tf_trt_models’ onto JTX2. Refer to the following link for details.
I tried to make ‘tf_trt_models’ to work for ‘faster_rcnn’ and ‘rfcn’ models. It was not straightforward (see reference below). I had to put it quite a few hacks to be able to build TF-TRT optimized graphs for those models. For example, I reduced the number of region proposals in the ‘faster_rcnn’/‘rfcn’ models from 300 to 32 (otherwise I’d be stuck with Out Of Memory issues on JTX2). I was finally able to get the models to work to a certain degree (not completely working yet…). And I put my latest code in my GitHub repository.
https://github.com/jkjung-avt/tf_trt_models
Then I measured and compared the following TF-TRT optimized models on JTX2 (in MAX-N mode).
ssd_mobilenet_v1_coco (90 classes): 43.5 ms
ssd_inception_v2_coco (90 classes): 45.9 ms
ssd_mobilenet_v1_egohands (1 class): 24.5 ms
ssd_mobilenet_v2_egohands (1 class): 28.7 ms
ssdlite_mobilenet_v2_egohands (1 class): 28.9 ms
ssd_inception_v2_egohands (1 class): 25.9 ms
rfcn_resnet101_egohands (1 class): 351 ms
faster_rcnn_resnet50_egohands (1 class): 226 ms
faster_rcnn_resnet101_egohands (1 class): 317 ms
faster_rcnn_inception_v2_egohands (1 class): 117 ms
Reference: https://github.com/NVIDIA-Jetson/tf_trt_models/issues/6#issuecomment-425857759
Hello,
I am working in Jetson TX2 with Jetpack 3.3 which has a TensorFlow v1.9 and TensorRT v4, currently I am testing SSD Inception V2 model getting an average of 50 ms per frame with no tensorRT optimization for my frozen model. I am stuck in the optimization process as I want to further reduce the inference time with trt.create_inference_graph function.
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=1 << 25,
precision_mode=‘FP16’,
minimum_segment_size=50
)
I get the following error:
2019-04-23 21:31:36.655950: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2019-04-23 21:31:41.939334: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:0 due to: “Invalid argument: Output node ‘FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/concat-4-LayoutOptimizer’ is weights not tensor” SKIPPING…( 844 nodes)
I saw this issue in one of the previous comments, was there any solution for this?
Thank you!!
I have a solution to the “extremely long model loading time problem” of TF-TRT now. Please check out my blog post for details: [url]https://jkjung-avt.github.io/tf-trt-revisited/[/url].
Hi jkjung13
I have flashed jetpack4.2 on my TX2 board.
I trained an object detection model using Feature pyramid network(model size is 242MB).
Whenever I try the inference code on TX2 - the process is getting killed just before starting the session for processing frame(it’s able to load the model though).
I tried limiting/allocating the full memory usage using tensorflow (version 1.13), didn’t work.
any solution for this ?
Thanks in advance.
@varun365 There is a very similar issue report on GitHub: [url]https://github.com/NVIDIA-AI-IOT/tf_trt_models/issues/6#issuecomment-498207648[/url]
The problem is clearly due to out-of-memory. I’m not able to solve it. Hopefully TF-TRT would be improved over time and such larger models would work on a future release of it.