gpu vs cpu perfomance test gave the same speeds

Hello. I have tested perfomance of cpu and gpu on tx2 using these simple tests on opencv:
for gpu:

auto canny_edg = cv::cuda::createCannyEdgeDetector(50, 150, 3);         
for (int j = 0; j < 100; j++)
{
  timer.start();
  cv::cuda::GpuMat canny_output;
  canny_edg->detect(frame, canny_output);
  qDebug() << timer.elapsed();
}

some part of time sample (ms):

13
13
12
14
14
12
12
13

and for cpu:

for (int j = 0; j < 100; j++)
{
 timer.start();
 cv::Mat canny_output;
 cv::Canny(flipped, canny_output, 50, 150, 3);
 qDebug() << timer.elapsed();
}

some part of time sample (ms):

8
10
11
12
19
7
11
12

as we can see, results are same in average. is it ok? i have expected some better perfomance on gpu. probably i have missed some gpu options?

my sudo ./jetson_clocks.sh --show:

SOC family:tegra186  Machine:quill
Online CPUs: 0,3-5
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu1: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200
cpu2: Gonvernor=schedutil MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200
cpu3: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu4: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
cpu5: Gonvernor=schedutil MinFreq=2035200 MaxFreq=2035200 CurrentFreq=2035200
GPU MinFreq=1122000000 MaxFreq=1122000000 CurrentFreq=1122000000
EMC MinFreq=40800000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: speed=255

Hi,

It’s recommended to check more detail with the OpenCV team directly.
OpenCV is a third-party library so it’s not easy for us to have a comment.

By the way, we also have a Vision library optimized for the Jetson platform.
You can also give it a try: https://developer.nvidia.com/embedded/visionworks

Thanks.