PCIe SMMU issues on PCIe-C0 with an NVMe SSD connected to the M.2 Key M slot

Hi,

I used the M.2 Key M slot to install a Samsung V-Nand SSD 970 Pro on my AGX Xavier (Developer kit, L4T release R32.2.1).

I want to move data to/from the SSD to the GPU memory through DMA. So I found this project https://github.com/enfiskutensykkel/ssd-gpu-dma that provides an API for building userspace NVMe drivers.

I tried to run it but, I seems it can’t work with SMMU enabled. So I disabled the SMMU for PCIe controller-0 with the instructions of in comment #4 of https://devtalk.nvidia.com/default/topic/1043746/jetson-agx-xavier/pcie-smmu-issue/. After reflashing the Xavier board with the new device tree, I verified that SMMU is disabled for PCIe controller-0 by extracting current device-tree.

However, when I tried to run one example of this project (https://github.com/enfiskutensykkel/ssd-gpu-dma), I got an unhandled context fault on smmu1 and new errors (from memory controller). Bellow, the output of ‘dmesg -w’:

[   867.404231] mc-err: (255) csr_pcie0r: EMEM address decode error
[   867.404418] mc-err:   status = 0x200640d8; addr = 0xffffffff00; hi_adr_reg=ff08
[   867.404574] mc-err:   secure: yes, access-type: read
[   867.404675] mc-err: mc-err: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
[   867.404702] t19x-arm-sumu: Unhandled context fault: smmu1, iova=0x398325000, fsynr=0xd0001, cb=0, sid=86(0x56 - PCIE0), pgd=0, pud=0, pmd=0, pte=0

How can I completely disable SMMU for PCIe controller-0? How can I fixe the mc-err errors?
Thanks!

Hi,

Please, can you help me to solve my mc-err errors?

Hi,
The project looks specific to x86 with desktop GPUs. May not work for Xavier. So the default GPU memory is not big enough for your usecase and you need to have extra memory for CUDA processing on Xavier?

Yes, the default GPU memory is not big enough for my usecase and I need to have extra memory for CUDA processing on Xavier.

In fact, my usage consists in acquiring very high resolution images at a high speed, performing some CUDA processing and then saving the input images and the results.

The default GPU memory of the Xavier AGX could be enough to receive a sequence of input images in order to process them. But at the end of the processing, it will be necessary to be able to save (elsewhere) the input images as well as the results to free up space on the memory of the GPU for the following sequence of images. And the overall memory of the Xavier is really not enough for saving all this data.

So, we decided to insert a 1TB NVMe SSD on the M.2 Key M slot. Given our real time constraints, and the speed at which the images arrive on the Xavier, we want to be able to move the data to/from the GPU memory to the SSD using DMA.

Hi @DaneLLL,

Yes, the default GPU memory is not big enough for my usecase and I need to have extra memory for CUDA processing on Xavier. My project will be included in an embedded system, so I can’t use a x86 desktop with GPUs. I need to use an embedded board and Xavier AGX seems to be the good choice. Any idea on how I can move the data to/from the GPU memory to the SSD NVMe using DMA for my CUDA processing or to solve the mc-err errors?

Thank

Hi,
This method is not supported in existing L4T releases. We will check if we can support/implement it in future releases.

Thanks for your feedback @DaneLLL.

But how can we completely desible the SMMU for PCIe controller-0? I modified my device tree by commenting the two instructions (shown in the comment #4 of https://devtalk.nvidia.com/default/topic/1043746/jetson-agx-xavier/pcie-smmu-issue/) in file /public_sources/harware/nvidia/soc/t19x/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi and build the kernel from source. Next, I copied the necessary files in my host and then reflashed my Xavier board.

However, as can be seen in my comment #1, I still have an “Unhandled context fault” on smmu1. Notive that before I modify the device tree and reflash, the “Unhandled context fault” occur on smmu0.

Hi,
The modification in device tree is good. You should be able to boot up successfully.

With further investigation, we have confirm the project is not supported on Xavier. It is because PCIe P2P protocol is not supported on Xavier. And the API in nv-p2p.h is different between desktop GPUs and Jetson platforms.

Desktop
int nvidia_p2p_dma_map_pages(struct pci_dev *peer,
        struct nvidia_p2p_page_table *page_table,
        struct nvidia_p2p_dma_mapping **dma_mapping);

Jetson
int nvidia_p2p_dma_map_pages(struct device *dev,
                struct nvidia_p2p_page_table *page_table,
                struct nvidia_p2p_dma_mapping **map,
                enum dma_data_direction direction);

We have Xavier 16GB/8GB on market. It should be good for most DL models. If the model requires more than 16GB,
you probably can check desktop GPUs.

Hi,

Thanks for your feedback @DaneLLL.

Indeed, I had seen that the kernel API in nv-p2p.h requires somes modifications for porting on Jetson (as specify in point 4.4 of the GPUDirect RDMA documentation at https://docs.nvidia.com/ cuda / GPUDirect-RDMA/index.html#porting-to-jetson).

I had made the necessary modifications in the project. But if the PCI P2P protocol is not supported on Xavier, the project will not work on AGX Xavier.

Could you explain me why Xavier does not support the PCI P2P protocol?

In comment #2 of https://devtalk.nvidia.com/default/topic/987076/jetson-tx1/gpudirect-rdma-on-jetson-tx1-/ it is said that on Jetson, the GPU is not not connected via the PCIe bus, but rather is directly connected to the memory controller. Is this the case on the Xavier?

Hi,
Please check Xavier TRM:
https://developer.nvidia.com/embedded/downloads#?search=trm

It is a hardware limitation.

Hi @DaneLLL,

I understand that direct access to the gpu memory is not going to be possible in my case.

By inspecting the output of the “lspci -v” command, we can see that :

0000:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd Device a801
	Flags: bus master, fast devsel, latency 0, IRQ 32
	Memory at 1b40000000 (64-bit, non-prefetchable) 
	Capabilities: <access denied>
	Kernel driver in use: nvme

The line with “Flags: bus master …” shows that my SSD can access to the system memory (unless I am mistaken).

So, would it be possible to perform DMA transfers between the system memory and my SSD (using the nvme driver by default?). If yes, could you please tell me how to do it? I’ve spent the past few days searching the internet for resources/documentations/posts on how userspace data can be copied on NVMe SSD drive through linux nvme default driver.

Thanks again

Hi,
We have Xavier 16GB, 8GB, TX2 8GB, 4GB modules. The platforms are mainly designed for embedded usecases. Xavier 16GB is the module with maximum memory size. From the discussion, it seems like desktop GPUs are better for this usecase. Also there is existing implementation.

Due to the limitation that PCIe P2P is not supported on Xavier, still would like to suggest you consider desktop GPUs.