Efficient Max Pooling

What would be the most efficient bottleneck-free way to do max pooling on some tensor data from inside a custom plugin?

Do you mean inference?
TensorRT natively supports max-pooling as you can find in Developer Guide :: NVIDIA Deep Learning TensorRT Documentation , so if you use TensorRT, you don’t need to implement it as custom plugin.

No, post processing heatmaps after inference.

is it exactly same as trt max pooling layer?

I’m trying to emulate torch.nn.MaxPool2d, seems same as trt max pooling. And I what to do it on the output tensors after inference from inside a custom gst-plugin( like DS EXAMPLE).

if you using TRT for inference, you could call addPooling() before building the TRT engine to add this layer into TRT engine, here are the APIs

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_network_definition.html#af5bc6382c84b67d96b7010676e742e39

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_pooling_layer.html

1 Like

I need to do some manual post processing after infer on the raw tensor output before pooling. Can I call max pool after infer from inside my gst custom plugin? Can I do max pool on some random user made array?

certainly, you can put max pool in the post processing

Using TRT outside of primary inference? As in inside my custom gst plugin?

Either soluton of below is ok, but for your question you firstly asked in this post - “the most efficient bottleneck-free way”, option#1 is recommened.

  1. Using TRT max pool layer in TRT inference loop
  2. Add max pool in a post processing (by this way, I don’t think you can use TRT max pool layer, you need to have you own max pooling implementation, or you could use cuDNN max pooling API)
  3. add TRT max pool as TRT plugin (if going by this, option#1 is better and easier)

Any examples of cuDNN API used inside of custom gst plugin?
Also can you point me to the cuDNN API docs?
Thank you very much.

Any examples of cuDNN API used inside of custom gst plugin?

I’m not aware of such sample. It’s just integrate cuDNN code in TRT plugin.
If you have questions in TRT plugin, you can refer to TRT plugin sample in TRT package and TRT doc.
For cuDNN sample, you can find in cuDNN package - https://developer.nvidia.com/cudnn

cuDNN sample: API Reference - NVIDIA Docs

1 Like