Zero Copy Memory vs Unified memory CUDA processing

if you replace that line with

Mat img = imread("image.jpg");
ptr = img.data;

It will still give the same behavior. how can you make sure if it is touchable by the GPU ?

Hi,

cv::Mat is a CPU memory.
Please allocate a GPU memory to make sure CUDA can access the buffer pointer.

For example:

...
Mat img = imread("image.jpg");
ptr = img.data;

uchar* d_img;
cudaMalloc(&d_img, size*sizeof(uchar));
cudaMemcpy(d_img, img.data, size*sizeof(uchar), cudaMemcpyHostToDevice);
...

Thanks.

Thanks for your answer again, I believe there is a misunderstanding here, the whole idea is to avoid memory copy between CPU and GPU, and that’s why I used cudaMallocManaged(&ptr, sizeof(uchar)rowscols), and as we discussed earlier it does not work in the for loop.

Hi,

Although declaring as a unified memory at the beginning, the pointer is replaced by this code.

ptr = img.data;

You can try to read image into the original pointer rather than replacing it.
https://docs.opencv.org/3.0-alpha/modules/imgcodecs/doc/reading_and_writing_images.html

Thanks again for your reply. But since I am reading from video, a new frame every loop. I have to replace the data in the pointer. Is there any other way around it ?

Hi,

Do you need to read image with OpenCV?

OpenCV can’t read image into an allocated buffer(pinned memory) and can’t read image to a GPU buffer. (gpuMat)
So it always requires a memory copy process between general CPU buffer to GPU buffer.

Thanks.

Hi folks,
I’m here because I’m studying a similar problem. I would like to reduce lag glass-to-glass (from frame acquired from camera to frame printed on display). I know that there are a lot of ways to do that but I don’t understand how to merge it all together. How can I reduce it? how use zero copy (or something else) here?
Below my default code.

#include <opencv2/opencv.hpp>
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/video.hpp>
#include <opencv2/videoio.hpp>
#include <iostream>



template <typename T>
std::string to_string(T value)
{
	std::ostringstream os ;
	os << value ;
	return os.str() ;
}


std::string get_tegra_pipeline(int width, int height, int fps) {
    return "nvcamerasrc ! video/x-raw(memory:NVMM), width=(int)" + to_string(width) + ", height=(int)" +
    		to_string(height) + ", format=(string)I420, framerate=(fraction)" + to_string(fps) +
    		"/1 ! nvvidconv flip-method=4 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink";

}




int main() {
    // Options

	int WIDTH = 640;
	int HEIGHT = 480;
	int FPS = 60;

    // Define the gstream pipeline
    std::string pipeline = get_tegra_pipeline(WIDTH, HEIGHT, FPS);
    std::cout << "Using pipeline: \n\t" << pipeline << "\n";

    // Create OpenCV capture object, ensure it works.
    cv::VideoCapture cap(pipeline, cv::CAP_GSTREAMER);

    if (!cap.isOpened()) {
        std::cout << "Connection failed";
        return -1;
    }



    // View video
    cv::Mat frame;
    cv::Mat host;
    cv::cuda::GpuMat frame_gpu; //using OpeCV3


    while (1) {
           cap >> frame; // Get a new frame from camera
           frame_gpu.upload(frame); //upload to gpu
           frame_gpu.download(host);
           cv::imshow("Display GPU", host);
           cv::imshow("Display CPU", frame);
           cv::waitKey(1); //needed to show frame
       }

}

Hi,

Please remember to use GPU-based camera and display element to get better performance.
GStreamer samples are available in our document:
[url]http://developer2.download.nvidia.com/embedded/L4T/r28_Release_v1.0/Docs/Jetson_TX2_Accelerated_GStreamer_User_Guide.pdf[/url]

Thanks.