User defined Custom CUDA node profiling using Nsight

Hi,

I have my own user custom node based on CUDA for depth estimation.This custom node i am calling after nvxSemiGlobalMatchingNode()in my appilcation. My code is working fine on TX1 (both native build and cross-compiled build on x86x64). I was trying to profile the code from Nsight tool on x86x64 .I am getting popup which says the data collected while profiling are invalid and also i am not able to see my cuda node on the profiler data.

I found a doc at http://on-demand.gputechconf.com/gtc/2016/presentation/s6739-thierry-lepley-visionworks-toolkit-programming.pdf.
In this doc the performance of immediate and graph based approach for openvx implementation has been discuused (for nvx_sample_object_tracker page 67).On the profiler data i could see Drivers API and markers and Ranges are also dispalyed for Graph mode. My code also uses graph mode along with custom cuda node. but for me Drivers API is blank…

Could pls give some pointers what could go wrong? I would like profile my code the way it has been done in the doc mentopned above.

BR,
njs

Hi,

Thanks for your question.
Could you try if you can run the program via ssh.

Nsight use ssh for remote launching.
But if your code includes Render, it may need to be executed directly on the device.

Hi,

Thanks for the reply.
I am able to run my code through Nsight on Remote launch.It uploads the executable on TX1 (/tmp/nsight-debug) and works fine.
And in my application I am not using nvxio render ,instead i have used opencv for io.

BR,
njs

Hi,

Sorry for the late reply.
Could you try that if you can profile stereo_matching sample in Visionworks samples which also uses nvxSemiGlobalMatchingNode?

Thanks

Hi,

I was able to profile stereo_matching sample code from Nsight Tool without any pop-up warnings.
But Driver API’s are not captured in the profiling result.

BR,
njs

Hi,

Thanks for your testing.

It’s really helpful that we can reproduce your issue on our sample code.
We are looking into this and will let you know when we can offer further advice.

Thanks.

Hi,

Sorry for keeping you waiting.

Could you tried the following sample, which also used nvxSemiGlobalMatchingNode?
I can successfully get all the profiling data (including driver API) via nvvp.

Thanks.

#include <cmath>
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
#include <memory>
#include <NVX/nvx.h>
#include <NVXIO/FrameSource.hpp>
#include <NVXIO/Render.hpp>

int main(int argc, char** argv)
{
    std::string sourceUri = "/home/ubuntu/VisionWorks-1.5-Samples/data/signs.avi";
    nvxio::Application &app = nvxio::Application::get();
    app.addOption('s', "source", "Input URI", nvxio::OptionHandler::string(&sourceUri));
    app.init(argc, argv);


    nvxio::ContextGuard context;
    std::unique_ptr<nvxio::FrameSource> source(nvxio::createDefaultFrameSource(context, sourceUri));
    source->open();

    nvxio::FrameSource::Parameters frameConfig = source->getConfiguration();
    vx_image frame = vxCreateImage(context, frameConfig.frameWidth, frameConfig.frameHeight, VX_DF_IMAGE_RGBX);
    vx_image disp = vxCreateImage(context, frameConfig.frameWidth, frameConfig.frameHeight, VX_DF_IMAGE_S16);


    vx_graph graph = vxCreateGraph(context);
    vx_image virt_leftU8 = vxCreateVirtualImage(graph, 0, 0, VX_DF_IMAGE_U8);
    vx_image virt_rightU8 = vxCreateVirtualImage(graph, 0, 0, VX_DF_IMAGE_U8);
    vx_node cvtLeftNode = vxColorConvertNode(graph, frame, virt_leftU8);
    vx_node cvtRightNode = vxColorConvertNode(graph, frame, virt_rightU8);
    vx_node matchNode = nvxSemiGlobalMatchingNode(graph, virt_leftU8, virt_rightU8, disp, 0, 64, 8, 109, 5, 0, 1, 31, 32000, 0, 85, 2);

    vxReleaseImage(&virt_leftU8);
    vxReleaseImage(&virt_rightU8);
    while (true)
    {
        nvxio::FrameSource::FrameStatus status = source->fetch(frame);
        if (status != nvxio::FrameSource::OK) break;

        NVXIO_SAFE_CALL( vxProcessGraph(graph) );
    }

    vxReleaseNode(&cvtLeftNode);
    vxReleaseNode(&cvtRightNode);
    vxReleaseNode(&matchNode);
    vxReleaseImage(&frame);
    vxReleaseImage(&disp);
    return nvxio::Application::APP_EXIT_CODE_SUCCESS;
}

Hi,

Thanks for the reply.

I was able to profile the code given by you. Nvvp is displaying driver API also.
I could see Driver API-> cuEGLStreamConsumerAcquireFrame

But Nodes are not displayed onto the graph nvxSemiGlobalMatchingNode ,vxColorConvertNode etc…
Why??

BR,
njs

Hi,

Sorry for my late reply.

I tried to move function nvxSemiGlobalMatching into custom node and profiled it again.
Driver API’s result can be displayed via NVVP.

Could you give it a try?

#include <cmath>
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
#include <memory>
#include <NVX/nvx.h>
#include <NVXIO/FrameSource.hpp>
#include <NVXIO/Render.hpp>

enum {
    USER_LIBRARY = 0x1,
    USER_KERNEL_SIMPLE = VX_KERNEL_BASE(VX_ID_DEFAULT, USER_LIBRARY) + 0x0,
};

vx_status simple_kernel(vx_node node, const vx_reference *parameters, vx_uint32 num)
{
    vx_image leftU8 = (vx_image)parameters[0];
    vx_image rightU8 = (vx_image)parameters[1];
    vx_image disp = (vx_image)parameters[2];

    nvxio::ContextGuard context;
    vx_status status = nvxuSemiGlobalMatching(context, leftU8, rightU8, disp, 0, 64, 8, 109, 5, 0, 1, 31, 32000, 0, 85, 2);

    return VX_SUCCESS;
}

vx_status simple_validate(vx_node node, const vx_reference parameters[], vx_uint32 num, vx_meta_format metas[])
{
    return VX_SUCCESS;
}

vx_status registerSimpleKernel(vx_context context)
{
    vx_status status = VX_SUCCESS;
    vx_kernel kernel = vxAddUserKernel(context, "simple_kernel", USER_KERNEL_SIMPLE, simple_kernel, 3, simple_validate, NULL, NULL);
    status = vxGetStatus((vx_reference)kernel);
    if (status != VX_SUCCESS)
    {
        vxAddLogEntry((vx_reference)context, status, "Failed to create Simple Kernel");
        return status;
    }
    status |= vxAddParameterToKernel(kernel, 0, VX_INPUT , VX_TYPE_IMAGE , VX_PARAMETER_STATE_REQUIRED);
    status |= vxAddParameterToKernel(kernel, 1, VX_INPUT , VX_TYPE_IMAGE , VX_PARAMETER_STATE_REQUIRED);
    status |= vxAddParameterToKernel(kernel, 2, VX_BIDIRECTIONAL , VX_TYPE_IMAGE , VX_PARAMETER_STATE_REQUIRED);
    if (status != VX_SUCCESS)
    {
        vxReleaseKernel(&kernel);
        vxAddLogEntry((vx_reference)context, status, "Failed to initialize Simple Kernel parameters");
        return VX_FAILURE;
    }
    status = vxFinalizeKernel(kernel);
    if (status != VX_SUCCESS)
    {
        vxReleaseKernel(&kernel);
        vxAddLogEntry((vx_reference)context, status, "Failed to finalize Simple Kernel");
        return VX_FAILURE;
    }
    return status;
}

vx_node SimpleNode(vx_graph graph, vx_image leftU8, vx_image rightU8, vx_image disp)
{
    vx_node node = NULL;
    vx_context context = vxGetContext((vx_reference)graph);
    vx_kernel kernel = vxGetKernelByEnum(context, USER_KERNEL_SIMPLE);
    if (vxGetStatus((vx_reference)kernel) == VX_SUCCESS)
    {
        node = vxCreateGenericNode(graph, kernel);
        vxReleaseKernel(&kernel);
        if (vxGetStatus((vx_reference)node) == VX_SUCCESS)
        {
            vxSetParameterByIndex(node, 0, (vx_reference)leftU8);
            vxSetParameterByIndex(node, 1, (vx_reference)rightU8);
            vxSetParameterByIndex(node, 2, (vx_reference)disp);
        }
    }
    return node;
}

int main(int argc, char** argv)
{
    std::string sourceUri = "/home/ubuntu/VisionWorks-1.5-Samples/data/signs.avi";
    nvxio::Application &app = nvxio::Application::get();
    app.addOption('s', "source", "Input URI", nvxio::OptionHandler::string(&sourceUri));
    app.init(argc, argv);


    nvxio::ContextGuard context;
    registerSimpleKernel(context);

    std::unique_ptr<nvxio::FrameSource> source(nvxio::createDefaultFrameSource(context, sourceUri));
    source->open();

    nvxio::FrameSource::Parameters frameConfig = source->getConfiguration();
    vx_image frame = vxCreateImage(context, frameConfig.frameWidth, frameConfig.frameHeight, VX_DF_IMAGE_RGBX);
    vx_image leftU8 = vxCreateImage(context, frameConfig.frameWidth, frameConfig.frameHeight, VX_DF_IMAGE_U8);
    vx_image rightU8 = vxCreateImage(context, frameConfig.frameWidth, frameConfig.frameHeight, VX_DF_IMAGE_U8);
    vx_image disp = vxCreateImage(context, frameConfig.frameWidth, frameConfig.frameHeight, VX_DF_IMAGE_S16);


    vx_graph graph = vxCreateGraph(context);
    vx_node cvtLeftNode = vxColorConvertNode(graph, frame, leftU8);
    vx_node cvtRightNode = vxColorConvertNode(graph, frame, rightU8);
    vx_node customNode = SimpleNode(graph, leftU8, rightU8, disp);

    while (true)
    {
        nvxio::FrameSource::FrameStatus status = source->fetch(frame);
        if (status != nvxio::FrameSource::OK) break;

        NVXIO_SAFE_CALL( vxProcessGraph(graph) );
    }

    vxReleaseNode(&cvtLeftNode);
    vxReleaseNode(&cvtRightNode);
    vxReleaseNode(&customNode);
    vxReleaseImage(&frame);
    vxReleaseImage(&leftU8);
    vxReleaseImage(&rightU8);
    vxReleaseImage(&disp);
    return nvxio::Application::APP_EXIT_CODE_SUCCESS;
}

Maybe this issue is code-related. Could you share you code for us debugging?

Hi njs,

Have you clarified and resolved this issue?
Any further result can be shared?

Thanks

Hello Sir,

Sorry for late reply.
I ran the code given by you. I am looking for driver apis such as VxProcessGraph(API) and Markers and ranges as NVX and VX nodes to be shown on NVVP.But not able to find these on NVVP(I am using NVVP version 8.0.)

BR,
njs

Hi,

Thanks for your feedback.

Could you attach NVVP results for us debugging?

Hi,

PFA the nvvp results.

Regards,
njs
profile_15_may.zip (1.09 MB)

Hi,

Sorry for the late reply.
Please find the Driver API data under Thread 2802954240.

Thanks.

Hello,

Yes Driver API is shown under Thread 2802954240.
But I am looking for vxProcessGraph API under Driver API.
Could you please check.

BR,
njs

Hi,

Driver API field is specified for the function implemented in CUDA driver.
For vxProcessGraph API, it is a general CPU function.
(But the back-end implementation, ex. RGBX_to_GRAY, is a GPU function and is included in the profiling)

You can enable “Profile execution on the CPU” to get more CPU profiling data.
But if you are looking for completely CPU function profiler, please use TegraSystemProfiler.

Thanks.