Inconsistent result of template matching with mask

Description

The template matching algorithm with a mask is giving different results for CPU and CUDA backend. When the mask is set to null, the results from both the backends are the same, but with the mask, they are different.

Environment

GPU Type: NVIDIA GeForce RTX 3090
Nvidia Driver Version: 535.183.06
CUDA Version: 12.2
CUDNN Version: 8500
Operating System + Version: Ubuntu 22.04.5 LTS
Python Version (if applicable): 3.10

Hi @prathameshdinkar19 ,
This forum talks about issues related to Tensorrt.
I recommend you to raise this in respective forum.

Thanks