Visionworks : Test about energy efficiency and performance

Hi, I’m new in visionworks. and I’m studying some advantange of visionworks compare to the other image process library, like opencv or cuda(in low-level perspective)

when our lab test visionworks in perspective power usage, it didn’t show to us better power efficiency compare to the opencv gpu module,

and in the image processing it show good performance only one case about canny edge function usage.

after that time, we knew that compare only one function performance is useless, because, we thought that the visionworks’s high performance is related to some stream processing and their graph structure.

today, when I test stream process, it show some performance difference between opencv gpu module.
we did normal canny edge filtering test like

  1. convert color to gray
  2. blurring image(we use box filter, because visionworks didn’t have bilateral filter and opencv 2.XX didn’t have gpu version of median filter)
  3. canny edge detection

In this test result,visionworks performance is better than opencv, but I don’t know how this result happened.

so my question is…

  1. why the visionworks didn’t show high enery efficiency compare to the opencv when we test this in image processing?

  2. why the visionworks streaming process(video process) is better performance than opencv?
    (In my case in the 2K video, actual computation time is 3 times better than opencv )

  3. is there any problem in my opencv code? and do this code problem make that performance difference?

#include "opencv2/opencv.hpp"
#include <iostream>
#include "opencv2/gpu/gpu.hpp"
using namespace cv;
using namespace std;
int main(int, char**)
    VideoCapture cap("NORWAY 2K.mp4"); // open video
    if(!cap.isOpened())  // check if we succeeded
        return -1;
    double fps = cap.get(CV_CAP_PROP_FPS);
    // For OpenCV 3, you can also use the following
    // double fps = video.get(CAP_PROP_FPS);
    cout << "Frames per second using video.get(CV_CAP_PROP_FPS) : " << fps << endl;
	Mat frame;
	gpu::GpuMat src;
	Size ksize;
	ksize.width =3;
	ksize.height =3;
	cv::gpu::GpuMat gray;
	cv::gpu::GpuMat blurred;
	cv::gpu::GpuMat edges;
	const int64 startWhole = getTickCount();	

        cap >> frame; // get a new frame from camera

        cv::gpu::cvtColor(src, gray, CV_BGR2GRAY);

        gpu::boxFilter(gray, blurred, -1,ksize);
        gpu::Canny(blurred, edges, 120, 240, 3, false);	
	Mat edges_host;;
	double timeSec = (getTickCount() - startWhole) / getTickFrequency();
	std::cout << "Whole Time : " << timeSec << " sec" << std::endl;
         imshow("edges", edges_host);

        if(waitKey(30) >= 0) break;
    return 0;
  1. I think that opencv gpu module is represented the common CUDA kernel process. so, Can I assume that the visionworks’s advantage in my today test is special characteristic of visionworks compare to the normal CUDA kernel process or Is that just weakness of opencv library? Is My Assumption right?


Thanks for your question.

Do you use opencv4tegra? And could you share the power data you measured for us checking?

The main advantage of Visionworks is that besides computation, we also optimize data transfer.

For example,

cap >> frame; // get a new frame from camera

This procedure lower the performance since the flow is: frame -> cpu -> gpu
In visionworks, we read frame directly into gpu-accessible memory, that is: frame -> gpu

This video can give you some hint:

Thank you AastalLL,
After I received your response, I modify my code and I realize some mistake about when I test visionworks image processing. So, I think I should test again about image process and video processing.

Before I upload the power data, I’m curious about how visionworks can be directly into gpu,

I think visionworks use CUDA unified memory for this, is it right?
then if I make visionworks user kernel, Can I make user kernel by cuda code?


Thanks for your feedback.

Actually, visionworks is a wrapper for some low-level camera framework, ex. v4l2, gstreamer…
In tegra, CPU and GPU share the same memory.
Usually, camera framework allocate a DMA buffer and then register this buffer via EGL to make it GPU accessible.