We have a Problem with the TX2 NX Module and PCIe Devices.
On the x2 Port of the TX2 NX is a NVMe SSD connected (M.2 Key M). On the other x1 PCIe Port, there are 3 PCIe Switches concatenated with also PCIe Devices additionally connected to them. We see now sometimes after reboots of the system, that the NVMe drive is not correctly initialized and the device is lost with the following message:
nvme nvme0: Minimum device page size 1324217728 too large for host (4096)
nvme nvme0: Removing after probe failure status: -19
Additionally we see an error message of the pcie-controller:
When we use the exact same setting with a Xavier NX Module, we never see this error, therefore we think it is something specific with the TX2 NX Module.
Is there like a limit of devices that can be attached to the PCIe Ports? When we reduce the devices attached to the x1 Port (connect only first PCIe Switch), the NVMe gets always initialized without an error.
Let us know if you need any additional information.
I can’t answer, but will add that power delivery is often an issue. If you have a way to power any or all devices externally, then try that to see if anything changes. Another issue is often one of signal quality and/or signal timing. Take a look at any kind of power delivery changes you might be able to make which (at least for testing) might improve power isolation and regulation among the devices.
Thank you for your answer.
We do not think power is an issue, as all connected devices are powered externally. We think it has to do with the memory allocation for the PCIe devices similar to the following topic:
The weird thing is that when we program an EEPROM of the PCIe Switch PI7C9X2G404SV for its configuration, the NVMe is not recognized anymore with the error message in our first post. When we erase the EEPROM (PCIe Switch default configuration), the NVMe is always recognized. As the NVMe is not behind the PCIe Switch and they are connected to two different PCIe Ports of the module, we would expect that they have no influence to each other. Do you have an idea why there is a correlation?
How can we check why the big page size is read from the NVMe BAR?