VisionWorks OpenVX vs OpenCV

stephenkosh106 · July 25, 2018, 2:57pm

Hello.
I’m developing vision program using VisionWorks.

So I’ve made a simple program using openvx and opencv.
It can be run in two modes, OpenVX or OpenCV.
OpenCV mode takes 80 ms while OpenVX mode takes 170 ms.

I don’t know why.
Also there is a custom node i’ve made.
But the node in the graph does not execute parallel.

So i’ve logged the performance counter.
For standard openvx nodes are slow.
For example vxGaussian3x3 node takes 6 ms and vxAbsDiff node takes 10 ms.

Does anyone explain this problem?
Thanks.

AastaLLL · July 26, 2018, 6:27am

Hi,

First of all, have you maximized the GPU clock?

sudo ./jetson_clocks.sh

If your image is put on CPU memory, the memory copy between processes will decrease the performance of openvx.
A recommendation is to profile your application to find out the bottleneck of the pipeline first.

sudo ./nvprof -o data.nvvp [your program]

Thanks.

stephenkosh106 · July 26, 2018, 9:06am

Thanks for your reply.
After i run jetson_clocks.sh, it was faster explicitly.
But the nodes in the graph still doesn’t run in parallel.

And when i run /usr/local/cuda/bin/nvprof -o data.nvvp ./test, data.nvvp file was generated.
But visual profiler shows me the graphs doesn’t run in parallel.
What’s the problem?
Thanks.

AastaLLL · July 30, 2018, 6:35am

Hi,

What kind of parallelism do you expect?

1. Different CUDA stream is required to launch tasks simultaneously.
2. Vision pipeline is put into same CUDA stream since we want it be executed in order.

Here is an example for parallel execution:

...
vx_graph graph1 = vxCreateGraph(context);
vx_graph graph2 = vxCreateGraph(context);
vx_node cvtNode1 = vxColorConvertNode(graph1, frame, gray1);
vx_node cvtNode2 = vxColorConvertNode(graph2, frame, gray2);
...

Thanks.

stephenkosh106 · July 30, 2018, 2:07pm

Hi.

Here is a program what i tested.

const int width = 1280;
const int height = 1024;

vx_graph graph1 = vxCreateGraph(context);
vx_graph graph2 = vxCreateGraph(context):

vx_image frame = vxCreateImage(context, width, height, VX_DF_IMAGE_RGB);
vx_image img1 = vxCreateImage(context, width, height, VX_DF_IMAGE_U8);
vx_image img2 = vxCreateImage(context, width, height, VX_DF_IMAGE_U8);

vx_node node1 = vxColorConvertNode(graph1, frame, img1);
vx_node node2 = vxColorConvertNode(graph2, frame, img2);

vxVerifyGraph(graph1);
vxVerifyGraph(graph2);

nvx::Timer procTimer;
procTimer.tic();

vxScheduleGraph(graph1);
vxScheduleGraph(graph2);
vxWaitGraph(graph1);
vxWaitGraph(graph2);

std::cout << procTimer.toc() << std::endl;

...

The result of this test program are as following.

Graph1 : 2.8 ms
Graph2 : 1.5 ms
Total : 4.4 ms

If executed in parallel, then i think the Total Time is Max Time of Graph1 and Graph2.
But as you can see above, the Total Time is Graph1 + Graph2.
So i think parallel doesn’t work.

What’s wrong with me?
Can you show me the result of this program or visual profiler snap image?
Thanks.

AastaLLL · August 3, 2018, 7:55am

Hi,

Here is an example for your reference:
[url]https://devtalk.nvidia.com/default/topic/1006304/jetson-tx1/visionworks-how-can-i-execute-parallel-node-process-in-graph-/post/5139436/#5139436[/url]

Thanks.