I have been profiling a network in Nsight-Compute and I noticed a lot of kernel launches with the name “nchwTonhwc”.
I assume this kernel is doing exactly what its name suggests: changing the order from NCHW to NHWC.
The input I give TensorRT is in NCHW format and I was under the impression that TensorRT also works with this format.
Can anyone explain what is happening?
(PS: I use a lot of parametricReLU layers in the network. Maybe that is of any importance?)
TensorRT Version: 7.0
GPU Type: Quadro RTX 5000
Nvidia Driver Version: 440.40
CUDA Version: 10.2
Operating System + Version: Ubuntu 16.04