With TensorRT 7.0 comes compatibility with 3D convolution. But 8 bit integer quantization still isn’t available for 3D convolution, as shown here, section “Layer and precision” : Support Matrix :: NVIDIA Deep Learning TensorRT Documentation
However, it’s a huge part of performance gains.
In fact, we should be able to look at a 15 folds performance gain with TensorRT (based on what I obtain with 2D models on various hardware) whereas, with what is available right now, I could only obtain a *1.5 folds.
Are there plans to make the rest of TensorRT optimizations available for 3D convolution ?
There has been progress. Since Tensorrt 7.2, Tensorcore can be used to speedup INT8 inference of 3d conv layers. This provides some speedup on select GPUs.
But not all optimizations are available yet. There is still no speedup with INT8 on Pascal GPUs, and all gpus without tensor cores, whereas all GPUs see huge speedup on 2D models when comparing INT8 vs FP16.