I converted mobilenet v1/v2 tensorflow model by trt.create_inference_graph(); then use it in tensorflow-tensorrt.
however, the performance improvement isn’t much, it is ~10% improvement on infer time. (FLOPs/second: 104.72B)
(for other net as inception v2, the improvement is higher).
then I did some investigation:
most computation of mobilenet v1/v2 is about 1x1 conv, and 1x1 conv is memory friendly with NHWC data format. but tensorRT supports NCHW data format.
in <<TensorRT-Developer-Guide 5.pdf>>, the mentioned op includes Conv2d and DepthwiseCOnv2dNative. I guess 1x1 conv is treats as common Conv2d; not be optimized specifically.
since 1x1 conv is widely used in current net models, could tensorRT do some optimize on it?
for example: it may benefit if NHWC data format is supported.
The engineering team is reviewing this enhancement request. I don’t have additional info to share publically. Please stay tuned for future release announcements.
I tried to analysis the profiling data, and found that the depthwise conv in mobilenet cost much more time.
in the bottleneck structure of mobilenet, 1x1 op has much bigger computation than depthwise 3x3 conv; but depthwise 3x3 cost much more time.
seems optimization is required for depthwise conv for tensorRT.
another question: what’s the meaning for ‘depthwise input reformatter’?