Reading CSI camera input directly to GPU memory

febagroth88 · November 24, 2022, 12:02pm

Hello,
I have an application uses jetson nano. I’m using gstreamer pipeline to read from csi camera (imx-219) using opencv and do all image processing stuff in cuda. All works ok, I’m using appsink to read in opencv capture and upload frame to gpu.
So my pipeline is like this basically,
Gstreamer → Opencv → Cuda
However, I cannot achieve desired fps (~28fps) since my cuda kernels are also time consuming and copying from host to device takes a lot of time. (I’m using cuda streams as well)

I was curious is it possible to put image directly in gpu memory without uploading from cpu to gpu. I really don’t know the underlying structure and this might be a silly question.
I don’t need the frame in cpu at first but later I need to download the image to stream it.

Is this possible? Also with this solution can I increase the fps rate of my program?

ShaneCCC · November 27, 2022, 7:12am

I would suggest using MMAPI like cudaBayerDemosaic/cudaHistogram for your case.

Honey_Patouceul · November 27, 2022, 7:20pm

Capturing from nvarguscamerasrc with gstreamer, you would have the buffers into NVMM memory ready for GPU processing. If your processing doesn’t change resolution, you may use nvivafilter that can perform CUDA processing on NVMM buffers (RGBA or NV12, the latter may have stride constraint).
As an example of using opencv/cuda from nvivafilter, see:

febagroth88 · November 28, 2022, 6:54am

Thank you for your responses, actually I was looking for this : the buffers into NVMM memory ready for GPU processing.
Now it makes sense for me to use CSI camera, my processing involves opencv::cuda functions as well, I will check the proposed solutions and update this thread.

febagroth88 · December 4, 2022, 9:01pm

Okay now I found what I was looking for, actually @Honey_Patouceul the answer was one of your old posts :
gpu-acceleration-support-for-opencv-gstreamer-pipeline

So if someone encounters the same problem as I did, I wanted to contribute with a simple example as well. @dusty-nv’s jetson-utils library actually provides an easy use of capturing image and handling it inside the NVMM memory. (Just compile it with -DNVMM_ENABLE=1)

I also wrote a simple test script to check the actual result, this example for my csi camera. Using gstcamera of jetson-utils library allows you to handle the frame in cuda or even as cv::cuda::GpuMat.

void trying_jetsonutils(){

	std::cout << "Trying NVMM read" << std::endl;

	// create input stream
	videoOptions opt;
	opt.width  = 3264;
	opt.height = 1848;
	opt.frameRate = 28;
	opt.zeroCopy = false; // GPU access only for better speed
	opt.resource = "csi://0";
	// videoSource * input = videoSource::Create("csi://0", opt);
	gstCamera * input = gstCamera::Create(opt);
	if (!input) {
		std::cerr << "Error: Failed to create input stream" << std::endl;
		exit(-1);
	}

	// Read one frame to get resolution
	uchar3* image = NULL;
	if( !input->Capture(&image, 1000) )
	{
		std::cerr << "Error: failed to capture first video frame" << std::endl;
		delete input;
		exit(3);
	}

	
	cv::cuda::GpuMat dummy_frame(input->GetHeight(), input->GetWidth(), CV_8UC3);
	
	int i = 0;

	std::chrono::high_resolution_clock::time_point start_time;
	std::chrono::high_resolution_clock::time_point end_time;
	std::chrono::microseconds duration;

	while( 1 ){
		// capture next image
		if( !input->Capture(&image, 1000) ){
			std::cerr << "Error: failed to capture video frame" << std::endl;
			continue;
		}
		

		// Some OpenCv processing
		start_time = std::chrono::high_resolution_clock::now();
		cv::cuda::GpuMat frame_in(input->GetHeight(), input->GetWidth(), CV_8UC3, image);
		end_time = std::chrono::high_resolution_clock::now();
		duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
		std::cout << "Upload duration: " << duration.count() << " us "<< std::endl;
		
		start_time = std::chrono::high_resolution_clock::now();
		cv::cuda::cvtColor(frame_in, dummy_frame, cv::COLOR_RGB2GRAY);
		end_time = std::chrono::high_resolution_clock::now();
		duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
		std::cout << "CvtColor duration: " << duration.count() << " us "<< std::endl;

		cv::Mat cpu_frame;
		start_time = std::chrono::high_resolution_clock::now();
		dummy_frame.download(cpu_frame);
		end_time = std::chrono::high_resolution_clock::now();
		duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
		std::cout << "Download duration: " << duration.count() << " us "<< std::endl;

		cv::imwrite("gpu_frame.png", cpu_frame);
		if( !input->IsStreaming() )
			break;
		if (i > 10)
			break;
		i++;
	}

	delete input;
}

The upload time is around ~2us right now comparing to my old test it was around ~25000us which means the image pointer is not copied because it was already in the memory that gpu can handle.

system · December 28, 2022, 3:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OpenCV GStreamer Capture really slow Jetson Xavier NX opencv , gstreamer	4	5005	October 18, 2021
opencv c++ with cuda Jetson Nano	2	1978	October 14, 2021
GPU Acceleration Support for OpenCV Gstreamer Pipeline Jetson Xavier NX opencv , gstreamer	17	8089	October 18, 2021
Nano not using GPU with gstreamer/python. Slow FPS, dropped frames Jetson Nano	22	16167	October 14, 2021
How do I use GStreamer pipeline to extract frames to Cuda's GpuMat? Jetson Xavier NX camera , opencv , cuda , gstreamer	4	1824	June 15, 2022
NVIDIA Gstreamer nvvidconv question Jetson Xavier NX gstreamer	5	2593	October 18, 2021
How to capture image which can directly using by CUDA? Jetson AGX Orin cuda	20	2460	April 23, 2024
How to read video with gstream + opencv + cuda Jetson Xavier NX opencv , cuda , gstreamer , python	3	6174	October 18, 2021
Some question about jetson nano/xavier-nx and deep stream DeepStream SDK	6	1111	October 12, 2021
[Python] How to convert OpenCV frame to CUDA memory Jetson Nano	3	3587	October 18, 2021

Reading CSI camera input directly to GPU memory

Related topics