PCIe SMMU issues on PCIe-C0 with an NVMe SSD connected to the M.2 Key M slot

RokiaDiarr · December 27, 2019, 1:40pm

Hi,

I used the M.2 Key M slot to install a Samsung V-Nand SSD 970 Pro on my AGX Xavier (Developer kit, L4T release R32.2.1).

I want to move data to/from the SSD to the GPU memory through DMA. So I found this project https://github.com/enfiskutensykkel/ssd-gpu-dma that provides an API for building userspace NVMe drivers.

I tried to run it but, I seems it can’t work with SMMU enabled. So I disabled the SMMU for PCIe controller-0 with the instructions of in comment #4 of https://devtalk.nvidia.com/default/topic/1043746/jetson-agx-xavier/pcie-smmu-issue/. After reflashing the Xavier board with the new device tree, I verified that SMMU is disabled for PCIe controller-0 by extracting current device-tree.

However, when I tried to run one example of this project (https://github.com/enfiskutensykkel/ssd-gpu-dma), I got an unhandled context fault on smmu1 and new errors (from memory controller). Bellow, the output of ‘dmesg -w’:

[   867.404231] mc-err: (255) csr_pcie0r: EMEM address decode error
[   867.404418] mc-err:   status = 0x200640d8; addr = 0xffffffff00; hi_adr_reg=ff08
[   867.404574] mc-err:   secure: yes, access-type: read
[   867.404675] mc-err: mc-err: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
[   867.404702] t19x-arm-sumu: Unhandled context fault: smmu1, iova=0x398325000, fsynr=0xd0001, cb=0, sid=86(0x56 - PCIE0), pgd=0, pud=0, pmd=0, pte=0

How can I completely disable SMMU for PCIe controller-0? How can I fixe the mc-err errors?
Thanks!

RokiaDiarr · December 30, 2019, 7:43am

Hi,

Please, can you help me to solve my mc-err errors?

DaneLLL · December 30, 2019, 8:15am

Hi,
The project looks specific to x86 with desktop GPUs. May not work for Xavier. So the default GPU memory is not big enough for your usecase and you need to have extra memory for CUDA processing on Xavier?

RokiaDiarr · December 30, 2019, 8:46am

Yes, the default GPU memory is not big enough for my usecase and I need to have extra memory for CUDA processing on Xavier.

In fact, my usage consists in acquiring very high resolution images at a high speed, performing some CUDA processing and then saving the input images and the results.

The default GPU memory of the Xavier AGX could be enough to receive a sequence of input images in order to process them. But at the end of the processing, it will be necessary to be able to save (elsewhere) the input images as well as the results to free up space on the memory of the GPU for the following sequence of images. And the overall memory of the Xavier is really not enough for saving all this data.

So, we decided to insert a 1TB NVMe SSD on the M.2 Key M slot. Given our real time constraints, and the speed at which the images arrive on the Xavier, we want to be able to move the data to/from the GPU memory to the SSD using DMA.

RokiaDiarr · December 31, 2019, 8:05am

Hi @DaneLLL,

Yes, the default GPU memory is not big enough for my usecase and I need to have extra memory for CUDA processing on Xavier. My project will be included in an embedded system, so I can’t use a x86 desktop with GPUs. I need to use an embedded board and Xavier AGX seems to be the good choice. Any idea on how I can move the data to/from the GPU memory to the SSD NVMe using DMA for my CUDA processing or to solve the mc-err errors?

Thank

DaneLLL · December 31, 2019, 8:52am

Hi,
This method is not supported in existing L4T releases. We will check if we can support/implement it in future releases.

RokiaDiarr · December 31, 2019, 9:34am

Thanks for your feedback @DaneLLL.

But how can we completely desible the SMMU for PCIe controller-0? I modified my device tree by commenting the two instructions (shown in the comment #4 of https://devtalk.nvidia.com/default/topic/1043746/jetson-agx-xavier/pcie-smmu-issue/) in file /public_sources/harware/nvidia/soc/t19x/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi and build the kernel from source. Next, I copied the necessary files in my host and then reflashed my Xavier board.

However, as can be seen in my comment #1, I still have an “Unhandled context fault” on smmu1. Notive that before I modify the device tree and reflash, the “Unhandled context fault” occur on smmu0.

DaneLLL · January 2, 2020, 9:33am

Hi,
The modification in device tree is good. You should be able to boot up successfully.

With further investigation, we have confirm the project is not supported on Xavier. It is because PCIe P2P protocol is not supported on Xavier. And the API in nv-p2p.h is different between desktop GPUs and Jetson platforms.

Desktop
int nvidia_p2p_dma_map_pages(struct pci_dev *peer,
        struct nvidia_p2p_page_table *page_table,
        struct nvidia_p2p_dma_mapping **dma_mapping);

Jetson
int nvidia_p2p_dma_map_pages(struct device *dev,
                struct nvidia_p2p_page_table *page_table,
                struct nvidia_p2p_dma_mapping **map,
                enum dma_data_direction direction);

We have Xavier 16GB/8GB on market. It should be good for most DL models. If the model requires more than 16GB,
you probably can check desktop GPUs.

RokiaDiarr · January 2, 2020, 1:44pm

Hi,

Thanks for your feedback @DaneLLL.

Indeed, I had seen that the kernel API in nv-p2p.h requires somes modifications for porting on Jetson (as specify in point 4.4 of the GPUDirect RDMA documentation at https://docs.nvidia.com/ cuda / GPUDirect-RDMA/index.html#porting-to-jetson).

I had made the necessary modifications in the project. But if the PCI P2P protocol is not supported on Xavier, the project will not work on AGX Xavier.

Could you explain me why Xavier does not support the PCI P2P protocol?

In comment #2 of https://devtalk.nvidia.com/default/topic/987076/jetson-tx1/gpudirect-rdma-on-jetson-tx1-/ it is said that on Jetson, the GPU is not not connected via the PCIe bus, but rather is directly connected to the memory controller. Is this the case on the Xavier?

DaneLLL · January 3, 2020, 2:31am

Hi,
Please check Xavier TRM:
https://developer.nvidia.com/embedded/downloads#?search=trm

It is a hardware limitation.

RokiaDiarr · January 9, 2020, 1:13pm

Hi @DaneLLL,

I understand that direct access to the gpu memory is not going to be possible in my case.

By inspecting the output of the “lspci -v” command, we can see that :

0000:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd Device a801
	Flags: bus master, fast devsel, latency 0, IRQ 32
	Memory at 1b40000000 (64-bit, non-prefetchable) 
	Capabilities: <access denied>
	Kernel driver in use: nvme

The line with “Flags: bus master …” shows that my SSD can access to the system memory (unless I am mistaken).

So, would it be possible to perform DMA transfers between the system memory and my SSD (using the nvme driver by default?). If yes, could you please tell me how to do it? I’ve spent the past few days searching the internet for resources/documentations/posts on how userspace data can be copied on NVMe SSD drive through linux nvme default driver.

Thanks again

DaneLLL · January 10, 2020, 2:13am

Hi,
We have Xavier 16GB, 8GB, TX2 8GB, 4GB modules. The platforms are mainly designed for embedded usecases. Xavier 16GB is the module with maximum memory size. From the discussion, it seems like desktop GPUs are better for this usecase. Also there is existing implementation.

Due to the limitation that PCIe P2P is not supported on Xavier, still would like to suggest you consider desktop GPUs.

Topic		Replies	Views
pcie smmu issue Jetson AGX Xavier	7	5543	October 18, 2021
How to disable SMMU for PCIe Jetson Xavier NX pcie	4	988	November 24, 2022
How to map kernel memory to user memory? Jetson TX2	47	9012	October 18, 2021
How to Carry out DMA transfer when sending data using PCIe from NVIDIA Root Port to a custom end point Jetson AGX Xavier pcie	16	3369	August 3, 2023
Jetson AGX xavier not able to do DMA operation through PCIE with INTEL FPGA/smmu issue Jetson AGX Xavier pcie	6	1462	February 12, 2023
What is the current status of PCIe DMA? Jetson TX1	21	8062	October 18, 2021
Operating DMA from external device on new OS versions Jetson AGX Xavier pcie	4	407	November 19, 2023
How to disable SMMU on Xavier? Jetson AGX Xavier pcie	2	1315	October 18, 2021
Receiving PCIe data from end point via bus mastering with SMMU disabled Jetson AGX Xavier pcie	2	489	March 13, 2024
Altera FPGA DMA to TX2 via PCIe problem Jetson TX2	18	3582	October 18, 2021

PCIe SMMU issues on PCIe-C0 with an NVMe SSD connected to the M.2 Key M slot

Related topics