TensorRT Custom RoiAlign plugin is very slow

You can generate a tiny model that only includes RoiAlign, and profiling the corresponding trt engine.