Profiling OpenCV Cuda code on Jetson TX2

Hi, I have made program that has a number of OpenCV cuda calls. I am trying to profile the GPU performance using nvprof but I get the following error

Warning: Unified Memory Profiling is not supported on the underlying platform. System requirements for unified memory can be found at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements
^C==3414== Profiling application: ./video_reader
==3414== Profiling result:
No kernels were profiled.

==3414== API calls:
No API activities were profiled.
==3414== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

How can I get around this? Or are there other profiling tools I can use on the Jetson TX2 (locally, not on another host PC) to measure the performance of the GPU?

Thanks

You may try nsight.
You may also use tegrastats for a global view of ressources usage.

Nsight needs another machine (host). Is there a way to install it locally on the TX2 ?

Nsight does not run directly on the Jetson (not supported in arm64 architecture).

Hi,

From the error message, there is no CUDA kernel code in your program.
Could you share your source for us checking?

Thanks.

Apologies for the late reply, here is my code

#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <opencv2/core/core.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <opencv2/opencv.hpp>

int main(int argc, char * argv[])
{
    cv::VideoCapture cap(0);
    
    for (;;)
    {
    cv::Mat img, y;
    cap >> img;
    cv::cuda::GpuMat img_g, x;
    img_g.upload(img);
    cv::cuda::cvtColor(img_g, x, cv::COLOR_RGB2BGR);
    x.download(y);
    imshow("y", y);
    cv::waitKey(1);
    
    
}
    
    
    
    return 0;
    
    
}

You can also get this message off with something like:

/usr/local/cuda/bin/nvprof --unified-memory-profiling off your_cvt

Not sure how much opencv does/can use unified memory by default. Other forum users may share their knowledge about this.

Furthermore, you may try to use runtime API:

#include <opencv2/opencv.hpp>
#include <opencv2/videoio.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <opencv2/highgui.hpp>
#include <cuda_runtime.h>
#include <cuda_profiler_api.h>

int main(int argc, char * argv[])
{
	cv::VideoCapture input(0);
	if(!input.isOpened()) {
		std::cout<<"Failed to open camera."<<std::endl;
		return -1;
	}

	cv::Mat          img,   y;
	cv::cuda::GpuMat img_g, x;
	[b]cudaProfilerStart();
[/b]	for (int loop=0; loop <100; ++loop)
	{ 
		if (!input.read(img))
			break;

		img_g.upload(img);
		cv::cuda::cvtColor(img_g, x, cv::COLOR_RGB2BGR);
		x.download(y);
		imshow("y", y);
		cv::waitKey(1);
	}
[b]	cudaProfilerStop();
[/b]
	return 0;
}

Compile with something like: (I’m using opencv-3.3.0 installed in /usr/local/opencv-3.3.0)

g++ -Wall -I/usr/local/opencv-3.3.0/include your_cvt.cpp <b>-I/usr/local/cuda/targets/aarch64-linux/include</b> -L/usr/local/opencv-3.3.0/lib -lopencv_core -lopencv_videoio -lopencv_highgui -lopencv_cudaimgproc <b>-L/usr/local/cuda/targets/aarch64-linux/lib -lcudart</b> -o your_cvt

and run:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/opencv-3.3.0/lib   # /usr/local/cuda/targets/aarch64-linux/lib should be a default path, but if not found you may add it
/usr/local/cuda/bin/nvprof  --print-gpu-trace --unified-memory-profiling off your_cvt

It worked. Thank you very much.