OpenCV CUDA Canny is slower than cv::Canny ?

I’ve written a simple code to test which CPU and GPU are faster.
This is the code:

#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/cudaimgproc.hpp>

using namespace std;
using namespace cv;

void canny_cv()
    Mat mat=imread("/home//1.jpg",0);

    Mat edge;

    double t1=getTickCount();
    for(int i=0;i<10;i++)

    double t2=getTickCount();
    cout<<"cv time:"<<(t2-t1)/getTickFrequency()/10<<endl;

void canny_cuda()
    Mat mat=imread("/home/1.jpg",0);

    Ptr<cv::cuda::CannyEdgeDetector> canny=cv::cuda::createCannyEdgeDetector(50,100);
    cv::cuda::GpuMat edge;
    cv::cuda::GpuMat src(mat);

    double t1=getTickCount();
    for(int i=0;i<10;i++)
    double t2=getTickCount();
    cout<<"cuda time:"<<(t2-t1)/getTickFrequency()/10<<endl;

int main(int argc, char *argv[])
    return 0;

And I get the results:
cv time: 0.048594s
cuda time: 0.125343s

GPU is slower than CPU?
Are my results normal or am I doing something wrong?


This is related to the OpenCV implementation.
It’s recommended to find openCV engineer for help directly.

Here are some initial suggestion for you:

1. Please build openCV with correct GPU architecture. sm=53 for Nano.
Please ignore this if you are using our default OpenCV package.

2. Please maximize the device performance first.

sudo jetson_clocks

By the way, we also have a library includes canny detector.
You can also give it a try:


Hi AastaLLL, Thanks for your reply!

I have rebuilt the openCV with GPU architecture. sm=53. and maximized the device performance.

I have tested visionworks’s Canny too.

But the results confused me!

Canny - CPU OpenCV cost time: 0.06526 sec.
Canny - CUDA OpenCV cost time: 0.09753 sec.
Canny - VisionWorks cost time: 0.20656 sec.

What’s going on here?


Are you measuring latency of a single operation, or throughput of many operations?

VisionWorks is quite interesting. Is it possiblento use it with Python?