Hello everybody,
I am trying to get VisionWorks to execute at the same time multiple nodes. I am developing on Jetson TX2 board updated with the last Jetpack, 3.2, so I am using VisionWorks 1.6
From the VisionWorks reference specifications, under the “OpenVX Design Overview”, I read the sequente statement:
“The vxMagnitudeNode and vxPhaseNode are independently computed, in that each does not depend on the output of the other. OpenVX does not mandate that they are run simultaneously or in parallel, but it could be implemented this way by the OpenVX vendor.”
I’m trying to utilize both CPU and GPU simultaneously in my vision application to maximize the performance. In my tests, I found that node in OpenVX graph are always executed in serial, even if the nodes could be executed in parallel.
To test this behaviour, I utilize this simple graph (no real application, just want to test the node indipendence execution). It assumes that a “big_image.jpg” is present, otherwise you can launch it with -s parameter to get your image.
At first, it launch a single graph with the possible CPU/GPU combination between Gaussian and Median filter. After that, a copy of the first graph is made and the two graph are executed in an asynchronous way. However, both the approaches didn’t utilize both CPU and GPU at the same time.
I attach the code
#include <NVX/Application.hpp>
#include <NVX/Utility.hpp>
#include <OVX/UtilityOVX.hpp>
#include <OVX/FrameSourceOVX.hpp>
#include <OVX/RenderOVX.hpp>
int main(int argc, char *argv[])
{
vx_context context = vxCreateContext();
nvxio::Application &app = nvxio::Application::get();
std::string sourceUri = "big_image.jpg";
app.setDescription("Example application node indipendence");
app.addOption('s', "source", "Source URI", nvxio::OptionHandler::string(&sourceUri));
app.init(argc, argv);
vx_image source = ovxio::loadImageFromFile(context, sourceUri, VX_DF_IMAGE_RGBX);
vx_node gauss_node, median_node;
vx_graph graph1 = vxCreateGraph(context);
vx_image tmp_image1 = vxCreateVirtualImage(graph, 0, 0, VX_DF_IMAGE_U8);
vx_image out_gauss1 = vxCreateVirtualImage(graph, 0, 0, VX_DF_IMAGE_VIRT);
vx_image out_filter1 = vxCreateVirtualImage(graph, 0, 0, VX_DF_IMAGE_VIRT);
vx_node extract_node1 = vxColorConvertNode(graph1, source, tmp_image1);
vx_node gauss_node1 = vxGaussian3x3Node(graph1, tmp_image1, out_gauss1);
vx_node median_node1 = vxMedian3x3Node(graph1, tmp_image1, out_filter1);
for(int i = 0; i < 4; i++)
{
// all combination CPU-GPU
switch(i)
{
case 0: case 1: vxSetNodeTarget(gauss_node1, NVX_TARGET_CPU, NULL); break;
case 2: case 3: vxSetNodeTarget(gauss_node1, NVX_TARGET_GPU, NULL); break;
}
switch(i)
{
case 0: case 2: vxSetNodeTarget(median_node1, NVX_TARGET_CPU, NULL); break;
case 1: case 3: vxSetNodeTarget(median_node1, NVX_TARGET_GPU, NULL); break;
}
vxSetNodeTarget(gauss_node, NVX_TARGET_CPU, NULL);
if (vxVerifyGraph(graph1) != VX_SUCCESS)
{
printf("Graph verification failed, see [NVX LOG] for details\n");
fflush(stdout);
exit(1);
}
vxProcessGraph(graph1);
}
// create the SAME graph but with different references!
vx_graph graph2 = vxCreateGraph(context);
vx_image tmp_image2 = vxCreateVirtualImage(graph2, 0, 0, VX_DF_IMAGE_U8);
vx_image out_gauss2 = vxCreateVirtualImage(graph2, 0, 0, VX_DF_IMAGE_VIRT);
vx_image out_filter2 = vxCreateVirtualImage(graph2, 0, 0, VX_DF_IMAGE_VIRT);
vx_node extract_node2 = vxColorConvertNode(graph2, source, tmp_image2);
vx_node gauss_node2 = vxGaussian3x3Node(graph2, tmp_image2, out_gauss2);
vx_node median_node2 = vxMedian3x3Node(graph2, tmp_image2, out_filter2);
vxSetNodeTarget(gauss_node1, NVX_TARGET_GPU, NULL);
vxSetNodeTarget(median_node1, NVX_TARGET_CPU, NULL);
vxSetNodeTarget(gauss_node2, NVX_TARGET_CPU, NULL);
vxSetNodeTarget(median_node2, NVX_TARGET_GPU, NULL);
if (vxVerifyGraph(graph1) != VX_SUCCESS)
{
printf("Graph verification failed, see [NVX LOG] for details\n");
fflush(stdout);
exit(1);
}
if (vxVerifyGraph(graph2) != VX_SUCCESS)
{
printf("Graph verification failed, see [NVX LOG] for details\n");
fflush(stdout);
exit(1);
}
vxScheduleGraph(graph1);
vxScheduleGraph(graph2);
vxWaitGraph(graph1);
vxWaitGraph(graph2);
vxReleaseContext(&context);
}
that, in a standard Jetson TX2 configuration, will compile by using the following command:
nvcc -std=c++11 test_node.cpp -I /usr/share/visionworks/sources/nvxio/include/ -L/usr/share/visionworks/sources/libs/aarch64/linux/release/ -L /usr/local/cuda-8.0/targets/aarch64-linux/lib/ -lcudart -lnvx -lovx -lvisionworks `pkg-config --libs gstreamer-base-1.0 gstreamer-pbutils-1.0 gstreamer-app-1.0 glfw3` -o test_node
The graph looks like this
So, it is possible to get “Gaussian Filter” and “Median Filter” to get executed in parallel because they are indipendent to each other. To get the file to be imported in the NVIDIA Visual Profiler, I run
export NVX_PROF=nvtx
nvprof --out-profile profile.log ./test_node
And there results confirm that node are executed serially in the graph
and different graphs are executed serially w.r.t. each other
So here is the question: is it possible to achieve a CPU and GPU combined processing horsepower, for example running in the same time the “Median node” on CPU while “Gaussian node” is processed in the GPU (or vice-versa)?
Thank you in advance for the answers!