User custom node with OpenCV kernel does not run on GPU

Hi everybody,

I am converting some of my vision functions in OpenCV to OpenVX node to accelerate my code on GPU of kit Jetson Tx2. According to the VisionWorks tutorial, we can implement kernels of OpenVX node in OpenCV functions thank to the interoperability between OpenCV and OpenVX. I have followed that way to write a simple OpenVX node for sharpening an image. However, when I run it and checked the GPU performance using “sudo ./tegrastats” command, I only saw the increase in the percentage of Jetson’s CPU, not GPU. It means that the code did not run on GPU, which was the reason why it was terribly slow.

I want to ask if OpenVX can optimize kernels implemented in OpenCV and process them on GPU? If yes, is there any special configuration to do so?

Thanks in advance,


To execute on GPU, implementation is required to write with CUDA.
For OpenCV, you can follow the function in cv::cuda::xxx to update your implementation:


Thanks for your answer,

So, you mean that if I do not implement the kernel in CUDA, OpenVX will treat it as host code and run it totally on CPU with no optimization, right?

Btw, Could you explain me how OpenVX optimize a user kernel (assume it is written in CUDA)?

Thank you,


We have lots of CUDA implement for openVX.
Check our VisionWorks library for details:

If you want to implement a custom function, please remember to write code with CUDA.
If you use OpenVX function directly, try to switch to our optimized VisionWorks function to get GPU acceleration.

Here is a tutorial for your reference:


Thank AastaLLL,

I am already using Visionworks and read those document.
It is very helpful to know that I need to write CUDA code to have my custom node accelerated, and Visionwork APIs are faster than theirs OpenVX counterparts.

Thanks again,

Hi AastaLLL,

I am still vague on how an OpenVX graph could speed up a custom node written with OpenCV GPU APIs. Could you explain me in more detail, please?
I really need to know about it to further optimize my algorithm.



VisionWorks(OpenVX) is good at speeding-up low-level utilization.
Ex, camera decode, video decode, rendering, …

But it doesn’t cover all the vision algorithm presented in OpenCV.
For this, we provide a simple wrapper to switch VisionWorks to OpenCV.

A user can easily leverage the acceleration of VisionWorks for decoding, and also use the vision function they needed from OpenCV.
More, if you want to run the vision function with GPU, please use CUDA for implementation.