Overview of memory - driver, kernel, DMA, userspace, CUDA, zero-copy

Is there a conceptual description of Jetson Nano memory management, including driver/kernel, DMA memory, user space, CUDA memory access, zero-copy etc.?

Would like a good overview of where memory is stored, who allocates/deallocates, who has access under what conditions, constraints for zero-copy access, all that sort of stuff. Haven’t found yet on search.

Our specific C/C++ application would read from a USB camera V4L2 (YUYV), likely do CUDA processing on the frame, forward to the H.264 encoder, and send the resulting bitstream via our custom network code. So we’d like to understand the most efficient way to use Jetson Nano memory for the handoffs.

Thanks in advance for your help!

For this use-case we would suggest run 12_camera_v4l2_cuda sample and check memory usage by executing top or sudo tegrastats. The sample demonstrates capturing frame data into NvBuffer(DMA buffer) directly.

The sample can show camera preview. For video encoding, please apply this patch and give it a try:
TX2 Camera convert/encode using Multimedia API issue - #17 by DaneLLL

Thank you for your reply!

I agree re 12_camera_v4l2_cuda, and have been already looking at that sample.

Re “DMA buffer” etc, is there any conceptual overview on the different types of memory, how things get pinned and by whom, what sorts of memory are accessible from which levels (kernel, userspace, GPU), memory lifecycle (who allocates, deallocates)?

I’ve seen forum posts where zero-copy memory (that we’d think would be more efficient because there’s no copying) results in lower performance because zero-copy disables CPU and GPU cache. So, for designing our solution (that will include stuff similar to the 12_camera_v4l2_cuda code), we need thorough understanding of the Nvidia memory options available and how to properly use them.

Is there any chance there’s an overview writeup in an introduction-to-Nvidia-architecture somewhere?

I did find this document for distant clues, though not sure if the described architecture applies to the Nano: GPUDirect RDMA :: CUDA Toolkit Documentation

The document is for desktop GPU and may not work on Jetson Nano. General use-case is USB camera and it use USB Video Class(UVC) driver to capture frames through v4l2. 12_camera_v4l2_cuda is the optimal solution on Jetson platforms. Please give it a try.

Thank you very much!

For a partial answer to my original question re the overview of V4L2 memory architecture/concepts, the link below from the Unix Kernel folks may be a useful starting point. This is generic V4L2 without NVIDIA refinements.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.