Integration between the Jetson AGX Orin and ConneectX-6 DX 100GbE card

We are trying to connect to the SDKs that we bought a ConneectX-6 DX 100GbE card (also a NVIDIA product) using the PCIe slot and it seems that the connection is not stable.
Most of the time, after connecting the card to the SDK the card is not detected by the SDK, but when I connect the same card to another machine that we have (not a SDK) it gets detected right away.
We can’t figure out if the problem is with the SDKs or with the NIC cards (few of the NIC cards are getting detected consistently by most of our SDKs but most of the cards are not (we could not found any difference between them except maybe the country that they were made in).

Is this issue known?

Thanks in advanced,
Dekel

Hi,

Please share the dmesg and lspci result when you have this card connected and boot up.

Also, have you tried other pcie cards? Do they work fine?

Hi, I have shared 3 files:

  1. dmesg and lspci result for a ConneectX-6 DX 100GbE card that got detected
  2. dmesg and lspci result for a ConneectX-6 DX 100GbE card that not got detected
  3. lspci result for 2 other PCIe cards that got detected (These are the the only other PCIe cards that I have right now that are not the ConnectX-6 card - seems that they work fine)

NIC card not detected (72.3 KB)
NIC card successfully detected (76.5 KB)
Other devices that successfully detected (878 Bytes)

please add disable-power-down to your device tree under C1 controller pcie@14100000.

After boot up, if the device is not detected, rebind the driver and it shall show.

cd /sys/bus/platform/drivers/tegra-pcie
echo 14100000.pcie > unbind
echo 14100000.pcie > bind

Could you please elaborate on how do I add the “disable_power_down” to the device tree under C1 controller pcie@14100000?

Do you have experience in building kernel/devicetree and updating them to jetson?

I have no experience with that.

Then try the easiest way. Use dtc tool to convert the kernel dtb file under your Linux_for_Tegra/kernel/dtb direcotry.

Locate the dtb you are using on your jetson AGX Orin by reading your flash log.

Add the property to pcie@14100000 and use dtc tool to convert it back to binary.

Reflash the whole board.

Ok - I found the path to the directory on the Host machine (attaching the directory content).
I would appreciate if you could guide me step-by-step on how do I proceed from here.

Plus, I have more then one Jetson AGX Orin SDK - with the first one that I flashed using the SDKmanager everything was downloaded fine and it is detecting all the NIC cards that I have. Then I tried to flash 3 more SDKs but the installation of “Computer Vision” failed. Now I don’t know if it’s related but the other 3 SDKS are not detecting most of the NIC cards.
SDKM_logs_JetPack_5.1.2_Linux_for_Jetson_AGX_Orin_modules_2023-09-04_14-39-38.zip (22.2 KB)

please file a different topic if what you are asking is not related to original problem.

What I’m asking is related to the original problem - The problem is that the Jetson AGX Orin machines that I have don’t detect some NIC cards (ConnectX-6 DX 100GbE).

I mean if you have some issue regarding the flash problem, file a new topic.

Flashing issue does not related to the NIC card detection…

1 Like

As for the steps-by-steps guide, google search the “device tree compiler” and it will show you how to use it. That is the dtc tool I meant in previous comment.

This is not NVIDIA tool but a very common 3rdparty Linux tool.

Hi WayneWW,

I’m trying to use you first suggestion (add the ‘disable-power-down’ flag and then unbind & bind), do I need to add it with the ‘nvidia’ prefix ( like this “nvidia, disable-power-down;”)?

Thanks,
Dekel

Yes, it should be “nvidia,disable-power-down” as this document mentioned.

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=pcie#debug-pcie-link-up-failure

Hi, after consulting with linuxdev, this is what I have done:

  1. Create the ‘extracted.dts’ file:
    $ dtc -I fs -O dts -o extracted.dts /proc/device-tree
  2. Find and open the ‘extracted.dts’ file using text editor
  3. Add the flag “nvidia,disable-power-down;” to the end of the pcie@14100000 flag list
  4. Change to root privilage and enter the next commands:
    $ echo 14100000.pcie > unbind
    $ echo 14100000.pcie > bind
  5. Reboot

After all these steps I turned off the SDK, connected the NIC card and turned it back on, but I still the SDK does not detect the card.

Am I missing something in the process? Don’t I need to compile the modified dts file back to dtb file and then maybe update the OS to take the modified file instead of the original one (between steps 3 and 4)?

Don’t I need to compile the modified dts file back to dtb file

Of course you need to do that…

The original dtb file is located inside the path inside /boot/extlinux/extlinux.conf. You can read this file first.

This is what I have done so far:

  1. sudo -s
  2. cd /boot/dtb
  3. dtc -I dtb -O dts -o kernel_tegra234-p3701-0005-p3737-0000.dts ./kernel_tegra234-p3701-0005-p3737-0000.dtb
  4. cp kernel_tegra234-p3701-0005-p3737-0000.dtb kernel_tegra234-p3701-0005-p3737-0000_ORIGINAL.dtb
  5. Add the “nvidia,disable-power-down” flag at the end of the pcie@14100000 configuration list.
  6. dtc -I dts -O dtb -o kernel_tegra234-p3701-0005-p3737-0000.dtb kernel_tegra234-p3701-0005-p3737-0000.dts
  7. Reboot the machine
  8. cd /proc/device-tree/pcie@14100000
  9. ls -l nvidia,disable-power-down
    that’s the output: “-r–r–r-- 1 root root 0 Sep 14 18:25 nvidia,disable-power-down”

The NVIDIA SDK still doesn’t recognise the NIC so I tried to do the unbind&bind commands but I got “-bash: unbind: Permission denied” error (I tried to use regular credentials, sudo credentials or root credentials).

Does everything else I have done looks OK?
Doesn’t the reboot of the machine itself reboot the pPCI too?
Any suggestions?

Please share the result of sudo dmesg as a text file here.