VisionWorks OpenVX vs OpenCV

I’m developing vision program using VisionWorks.

So I’ve made a simple program using openvx and opencv.
It can be run in two modes, OpenVX or OpenCV.
OpenCV mode takes 80 ms while OpenVX mode takes 170 ms.

I don’t know why.
Also there is a custom node i’ve made.
But the node in the graph does not execute parallel.

So i’ve logged the performance counter.
For standard openvx nodes are slow.
For example vxGaussian3x3 node takes 6 ms and vxAbsDiff node takes 10 ms.

Does anyone explain this problem?


First of all, have you maximized the GPU clock?

sudo ./

If your image is put on CPU memory, the memory copy between processes will decrease the performance of openvx.
A recommendation is to profile your application to find out the bottleneck of the pipeline first.

sudo ./nvprof -o data.nvvp [your program]


Thanks for your reply.
After i run, it was faster explicitly.
But the nodes in the graph still doesn’t run in parallel.

And when i run /usr/local/cuda/bin/nvprof -o data.nvvp ./test, data.nvvp file was generated.
But visual profiler shows me the graphs doesn’t run in parallel.
What’s the problem?


What kind of parallelism do you expect?

1. Different CUDA stream is required to launch tasks simultaneously.
2. Vision pipeline is put into same CUDA stream since we want it be executed in order.

Here is an example for parallel execution:

vx_graph graph1 = vxCreateGraph(context);
vx_graph graph2 = vxCreateGraph(context);
vx_node cvtNode1 = vxColorConvertNode(graph1, frame, gray1);
vx_node cvtNode2 = vxColorConvertNode(graph2, frame, gray2);



Here is a program what i tested.

const int width = 1280;
const int height = 1024;

vx_graph graph1 = vxCreateGraph(context);
vx_graph graph2 = vxCreateGraph(context):

vx_image frame = vxCreateImage(context, width, height, VX_DF_IMAGE_RGB);
vx_image img1 = vxCreateImage(context, width, height, VX_DF_IMAGE_U8);
vx_image img2 = vxCreateImage(context, width, height, VX_DF_IMAGE_U8);

vx_node node1 = vxColorConvertNode(graph1, frame, img1);
vx_node node2 = vxColorConvertNode(graph2, frame, img2);


nvx::Timer procTimer;


std::cout << procTimer.toc() << std::endl;


The result of this test program are as following.

Graph1 : 2.8 ms
Graph2 : 1.5 ms
Total : 4.4 ms

If executed in parallel, then i think the Total Time is Max Time of Graph1 and Graph2.
But as you can see above, the Total Time is Graph1 + Graph2.
So i think parallel doesn’t work.

What’s wrong with me?
Can you show me the result of this program or visual profiler snap image?


Here is an example for your reference: