I have an application uses jetson nano. I’m using gstreamer pipeline to read from csi camera (imx-219) using opencv and do all image processing stuff in cuda. All works ok, I’m using appsink to read in opencv capture and upload frame to gpu.
So my pipeline is like this basically,
Gstreamer → Opencv → Cuda
However, I cannot achieve desired fps (~28fps) since my cuda kernels are also time consuming and copying from host to device takes a lot of time. (I’m using cuda streams as well)
I was curious is it possible to put image directly in gpu memory without uploading from cpu to gpu. I really don’t know the underlying structure and this might be a silly question.
I don’t need the frame in cpu at first but later I need to download the image to stream it.
Is this possible? Also with this solution can I increase the fps rate of my program?
Capturing from nvarguscamerasrc with gstreamer, you would have the buffers into NVMM memory ready for GPU processing. If your processing doesn’t change resolution, you may use nvivafilter that can perform CUDA processing on NVMM buffers (RGBA or NV12, the latter may have stride constraint).
As an example of using opencv/cuda from nvivafilter, see:
Thank you for your responses, actually I was looking for this : the buffers into NVMM memory ready for GPU processing.
Now it makes sense for me to use CSI camera, my processing involves opencv::cuda functions as well, I will check the proposed solutions and update this thread.
So if someone encounters the same problem as I did, I wanted to contribute with a simple example as well. @dusty-nv’s jetson-utils library actually provides an easy use of capturing image and handling it inside the NVMM memory. (Just compile it with -DNVMM_ENABLE=1)
I also wrote a simple test script to check the actual result, this example for my csi camera. Using gstcamera of jetson-utils library allows you to handle the frame in cuda or even as cv::cuda::GpuMat.