first of all thanks for TensorRT it’s a great tool.
I built a custom model using Tensorflow that I converted to TensorRT. Now when I profile my application, I see statistics like:
Type Time(%) Time Calls Avg Min Max Name GPU activities: 18.71% 355.70ms 1860 191.24us 101.19us 268.46us volta_hcudnn_128x128_relu_small_nn_v1 18.69% 355.31ms 1500 236.87us 51.074us 861.37us trt_volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1 6.60% 125.42ms 540 232.25us 57.026us 1.0239ms trt_volta_h884cudnn_256x128_ldg8_relu_exp_small_nhwc_tn_v1 ...
But now I’m confused about the kernel names, I guess volta_hcudnn_128x128_relu_small_nn_v1 is computing relu on a 128x128 input but then what’s trt_volta_h884cudnn_256x128_ldg8_relu_exp_medium_nhwc_tn_v1.
Is there documentation about the kernels and what they are doing? My end goal is to profile my model to identify the bottlenecks and improve it but without a clear picture of what’s happening it’s hard.