Reserving physical memory, then directing pcie packets to there by address translation

Hey everyone,

I’m trying to achieve something with the xavier NX and failed over and over, miserably. In the question, I’ll give a description of what I’m trying to achieve, and step-by-step, how I failed.

My goal
I have a PCIe device I need to access to NX’s memory(This device does not have a specific driver, I’ll need to develop one for it). This device I’m talking about, and the NX, the two will be a compact device. Since, our configuration is solid and will not change, we decided to build our communication on BAR matching PCIe packets.

To achieve this, in the NX, we need to reserve a specific memory address and direct PCIe packets to that memory with inbound address translation unit of the pcie controller, then redirect pcie memory to that reserved space for this communication to be inbound. (I know this is achievable, we previously did the same with some other chip.)

Step-1: Reserving Physical Memory
So I added:

my_reserved: my_pciemem@ac200000 {
			reg = <0x0 0xac200000 0x0 0x08000000>;

};

to the device tree (tegra194-soc-base.dtsi). For choosing this specific address, I got help on the device tree node “memory” by:
xxd /proc/device-tree/memory/reg

Question about step 1:
When I checked /proc/iomem, there is no memory in the name of my choice and the memory I try to get is still the part of System RAM. Is there a way to check if this operation is successful?

Step-2: Programming the pcie iatu

Assuming the first step is successful, I moved on with programming iATU. After all calls, I checked the return value of functions for any errors, and got none. After programming the pcie controller, I checked the registers programmed and convinced myself that iATU programming is succesful.

However, when I let the other device start writing on my pcie memory, I got lots of SMMU errors. Here is just one, the others are the same except iova:

t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x1f40000000, fsynr=0x20011, cb=1, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0

Step-3: Disabling SMMU

When I searched the forum with the smmu error message, most popular answer I got is disabling SMMU. So I did that, at least I hoped I did. For disabling it, I did exactly the same with:

However, the errors still persist. However, I believe fsynr= value is changed.

Question about step 3:
Can I check if I disabled SMMU correctly for PCIE5?

So my question is: What have I done wrong or missing to not achieve my goal? Any corrections on any step is appriciated.

Hi,
XavierNX’s SMMU is enabled by default. You have to enable both the driver and the device-tree after making the changes. Is that done correctly?

Coming to your goal, I think the best way is to use the standard mechanisms to achieve it. i.e. Have a device driver for your endpoint in the host and let the driver allocate memory using the standard APIs like dma_alloc_coherent() Etc… and pass on that memory location to the endpoint and let the endpoint dump the data there. This is the easiest and the best way to achieve your goal.

Hey, nice to see such a quick and considerate answer here. Thanks in advance.

About the first thing, after making changes to device tree and code, I re-compiled all and replaced them and put it to the filesystem that linux boots from. Then changed boot configuration(extlinux.conf) in order to take new device tree, and also these are the files in cboot logs. Also, when I checked Cboot logs, I see logs about both files’ signature are not matching but they’re still allowed since the security fuse is not burned yet. These logs make me think that true files are in place for booting. Is there anything I may be missing here?

And about the second, I believe this is not possible on our side since the device plugged at pcie of NX is used with lots of other devices with the exact same configuration and people want to keep any design that already works to avoid extra work. However, I understand this is also do-able in NX from your message, just not the easiest way, right? I believe after having NX allow transactions on this memory, my problem can be solved easily.

This seems to suggest that you have made sure that the DT is properly updated. What about the kernel binary? Did you confirm that as well?

Yes. This is rather a long route, but if you want to unify your architectures, then, I think that should be fine.

Again, from the cboot logs, I am pretty sure that the kernel binary NX is booting is the one I compiled. Just to be sure, I’m copy-pasting related cboot logs here.

[0004.191] I> rootfs path: /sd/boot/extlinux/extlinux.conf
[0004.219] I> L4T boot options
[0004.220] I> [1]: "primary kernel"
[0004.220] I> Enter choice: 
[0007.221] I> Continuing with default option: 1
[0007.221] I> Loading kernel sig file from rootfs ...
[0007.221] I> rootfs path: /sd/boot/Image.sig
[0007.235] I> Loading kernel binary from rootfs ...
[0007.235] I> rootfs path: /sd/boot/Image
[0010.053] I> Validate kernel ...
[0010.053] I> T19x: Authenticate kernel (bin_type: 37), max size 0x5000000
[0010.365] E> digest on binary did not match!!
[0010.366] C> OEM authentication of kernel payload failed!
[0010.366] W> Failed to validate kernel binary (err=1077936152, fail=0)
[0010.367] W> Security fuse not burned, ignore validation failure
[0010.374] I> Loading kernel-dtb sig file from rootfs ...
[0010.374] I> rootfs path: /sd/boot/dtb/tegra194-p3668-all-p3509-0000.dtb.sig
[0010.392] I> Loading kernel-dtb binary from rootfs ...
[0010.392] I> rootfs path: /sd/boot/dtb/tegra194-p3668-all-p3509-0000.dtb
[0010.427] I> Validate kernel-dtb ...
[0010.428] I> T19x: Authenticate kernel-dtb (bin_type: 38), max size 0x400000
[0010.428] E> Stage2Signature validation failed with SHA2!!
[0010.429] C> OEM authentication of kernel-dtb header failed!
[0010.429] W> Failed to validate kernel-dtb binary (err=1077936152, fail=0)
[0010.430] W> Security fuse not burned, ignore validation failure

I’m certain these are the files I replaced, and confirmed it by checking their access and modification dates.

What is the L4T BSP version being used here? Is it 32.4.3 / 32.4.4 / 32.5 ??

I was using 32.4.4, but just upgraded it to 32.5 now, and I have an update about it. The errors I was complaining turned into different errors. Here they are:

mc-err: (255) csw_pcie5w: EMEM address decode error
mc-err:   status = 0x200100e3; addr = 0x1f40000400; hi_adr_reg=1f08
mc-err:   secure: no, access-type: write

This probably means smmu is disabled successfully. And also pcie module seems to be not directing our packets to proper memory address. If this is the case, I’m open to suggestions about that problem.

Looking at the error, I think your understanding is correct.
Regarding ‘PCIe module not being able to direct packets to proper memory address’, please check the iATU programming (I guess this is what you are using right??)