Input\output reformatters takes a lot of time

Hello,

I use TRT4.0 RC with Volta GPU.
Engine is created via uff_to_trt_engine with data type HALF. I see a lot of messages saying:

Adding reformat layer: in_conv/convolution reformatted input 0 (input_1) from Float(1,960,552960,2211840) to Half(1,960,1:8,552960)
[TensorRT] INFO: Adding reformat layer: in_conv/convolution output to be reformatted 0 (in_conv/BiasAdd) from Half(1,960,552960,17694720) to Half(4,3840,1:8,2211840)
[TensorRT] INFO: Adding reformat layer: conv2d_1/convolution reformatted input 0 (leaky_re_lu_1/LeakyRelu/Maximum) from Half(1,960,552960,47001600) to Half(11,10560,1:8,6082560)
...

It’s not clear to me why these conversions happen back\forth for each conv2d layer rather than once at beginning and end.

These reformat layers add additional time to the inference no matter what type of incoming data float16\32 is supplied.

Note: These in\out reformatters do not get added when trt engine is created with data type FLOAT.

Is there any way to get rid of those reformatters or make them appear only at start and end?