It's too slow for Multimedia APIs osd

I want to draw a rectangle by mmapi. But it’s so slow.
I find that

typedef enum{
    MODE_CPU, /**< Selects CPU for OSD processing.
                Works with RGBA data only */
    MODE_GPU, /**< Selects GPU for OSD processing.
                Yet to be implemented */
    MODE_HW   /**< Selects NV HW engine for rectangle draw and mask.
                   This mode works with both YUV and RGB data.
                   It does not consider alpha parameter.
                   Not applicable for drawing text. */
} NvOSD_Mode;

So I use MODE_HW.
I see sample11 and it tell me that

-t,--num-thread <number>     Number of thread to process [Default = 1]
	-m --osd-mode           OSD process mode: 0 CPU/2 VIC(only support RGBA format), 1 GPU [Default = 0]
	--bl                    OSD process on NV12 block linear for GPU mode
	-p,--perf            Calculate performance

I confuse that if I can use the GPU or VIC? Because I can’t find any docs about osd on this link
I’m jp5.1.1
I need to draw the rectangles on the video fast.
Can I use hardware to accelerate?
If I can VIC or GPU?
It’s better to use linear format, because decoder capture fd is linear. I don’t need to transform.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.

Do yo mean nvosd_draw_rectangles() is slow? Please specify which function you use and it does not meet the target performance.