PCIe Hot-Plug not working

Q1, if you cannot get those kernel messages which might mean you didn’t connect 2 P3710s correctly. Could you describe how you connect your P3710s? What cable are you using ? Where did you get the miniSAS cable? Thanks.

Q2. If you are sure about your connection is correct. Then please try to update the retimer FW to see if that works.

Q4. Please make sure 2 P3710s connect correctly first, then set up those commands again.

Dear @BruceYangNV ,

About Q1
The following mini SAS cables are used to connect the P3710 to each other.

P/N 716195-B21
image

Can you give me information on mini SAS cables that have been used on your company?

About Q2 and Q3
I updated the Retimer FW on two P3710’s and nothing changed.

thanks,

Hi Shibata-a,
Could you please help to confirm that your two P3710s connection as below? Thanks.
(Orin1) miniSAS port -A ↔ (Orin2) miniSaS port -B

Dear @BruceYangNV

I tried the following configuration as instructed. Unfortunately, it didn’t improve.
I also tried other port combinations, but unfortunately they didn’t improve.

(Orin1) miniSAS port -A <–> (Orin2) miniSaS port -B

If there is anything else to check, please instruct me.

thanks,

Hi Shibata-a,
We didn’t hit your issue before.
According to your reply, the kernel messages didn’t show any information when EP was connected to RP which means your HWs didn’t connect two P3710 correctly.
Please help to check the following items:

  1. Could you please help to check your bind command?
    ( The board name should be based on your devices)
    For Orin1:
    cd /drive/drive-foundation/make
    ./bind_partitions -b p3710-10-a01-f1 linux -s 1
    For Orin2:
    cd /drive/drive-foundation/make
    ./bind_partitions -b p3710-10-a01-f1 linux -s 2

  2. Please help to check your mini-SAS cable:
    If you are sure about you connect two P3710s correctly through mini-SAS cable, you may need to check if your mini-SAS cable is functional and in good condition. (We have no way to check your cable, you might need to do it on your own).

  3. One thing I can think of is to check your P3710 each by each. If you can confirm your cable is good, then plug the cable from port-A to port-B on the same P3710.
    3-a, try to rebind (drop the option “-s 1” or “-s 2” at binding) & flash the image.
    3-b, insert kernel modules at on same P3710:
    sudo modprobe nvscic2c-pcie-epc
    sudo modprobe nvscic2c-pcie-epf
    3-c, execute the rest commands as your test before as below to check it again.
    cd /sys/kernel/config/pci_ep/
    mkdir functions/nvscic2c_epf_22CC/func
    echo 0x10DE > functions/nvscic2c_epf_22CC/func/vendorid
    echo 0x22CC > functions/nvscic2c_epf_22CC/func/deviceid
    ln -s functions/nvscic2c_epf_22CC/func controllers/141c0000.pcie_ep

    echo 0 > controllers/141c0000.pcie_ep/start
    echo 1 > controllers/141c0000.pcie_ep/start
    3-d, it this P3710 is not working then check another P3710. By doing this, we should be able to know if one of them is broken. Two P3710s are broken at the same time doesn’t make sense to me.

Thanks.

1 Like

Dear @BruceYangNV ,

I’d like to confirm something.

About Q1, Q3-a
Regarding the command you instructed
(cd /drive/drive-foundation/make), do you execute it with P3710(Orin1)?

I tried to execute it on P3710(Orin1) as instructed by you and unfortunately, the result is as follows.

bash: cd: /drive/drive-foundation/make: No such file or directory

What should I do?
By any chance, do I need to run the bind command on the host PC??

Also, why do I need to re-flash?
Can I use SDKManager for re-flushing?

thanks,

Hi Shibata-a,
Do you have the pdk 6.0.4.0 ?
If yes, then try to go the the following folder (previous command needs to be corrected as below)
cd drive-foundation/make
(If you use SDK manager to download the image, the path should be DRIVE_OS_6.0.4_SDK_Linux_DRIVE_AGX_ORIN_DEVKITS/DRIVEOS/drive-foundation/make)

By any chance, do I need to run the bind command on the host PC??

I think the answer is YES.

Also, why do I need to re-flash?
Can I use SDKManager for re-flushing?

I think the default image on P3710 didn’t set the correct soc_id by default.
Furthermore, If I remember correctly, the SDK manager doesn’t support option “-s”.
Thanks.

1 Like

Dear @BruceYangNV ,

According to your instructions, 1. has the “-s” option but 3-a. says to remove the “-s” option.
Which is right??
↓↓↓↓↓

Hi Shibata-a,
Item 1 is for two Orins connected. If it still fails, then go check Item 3 which is for checking P3710 each by each.

One thing I can think of is to check your P3710 each by each

Dear @BruceYangNV ,

As per your instructions, I connected the mini SAS cable to ports A and B of the same P3710 and checked, but unfortunately the kernel message(dmesg) was not output.
I checked it on other P3710 and the results were the same on both.

I also prepared and checked another brand new mini SAS cable and unfortunately the results were unchanged.

Please provide information on the mini SAS cable you used.
If there is anything else I should check, please let me know.

Thanks,

@shibata-a

Please double confirm following the below steps:

  1. Connect miniSAS Port-A of NVIDIA DRIVE AGX Orin Devkit (As RP)-1 to miniSAS Port-B of NVIDIA DRIVE AGX Orin Devkit – 2(As EP) with a PCIe miniSAS cable.

  2. Boot DRIVE AGX Orin Devkit 2 (act as EP) First, then

  • Load EPF kernel module

sudo modprobe nvscic2c-pcie-epf

  • PCIe EPF hot plug

sudo -s
cd /sys/kernel/config/pci_ep/
mkdir functions/nvscic2c_epf_22CC/func
echo 0x10DE > functions/nvscic2c_epf_22CC/func/vendorid
echo 0x22CC > functions/nvscic2c_epf_22CC/func/deviceid
ln -s functions/nvscic2c_epf_22CC/func controllers/141c0000.pcie_ep
echo 0 > controllers/141c0000.pcie_ep/start
echo 1 > controllers/141c0000.pcie_ep/start

  1. Next Boot DRIVE AGX Orin Devkit 1 (act as RP). After booting complete
  • Verify the PCIe link of Orin Devkit 2 (as EP) is detected:

sudo lspci | grep NVIDIA

c3:00.0 Serial controller: NVIDIA Corporation Device 22cc

  • if PCIe Link is detected ( lspci command reports “c3:00.0 Serial controller”), load EPC kernel module

sudo modprobe nvscic2c-pcie-epc

  • if PCie link is not detected, try reflash retimer firmware and check the MiniSAS cable ( it happened , two minisas cable had broken in our setup)

Please also provide the information from the command. Thanks.

$ cat /proc/device-tree/chosen/nvidia,sku_version

Dear @VickNV ,

I tried with the instructions you gave, but the RP did not recognize the EP as a PCIe device.

I checked the “nvidia, sku_version” of 2 P3710.
The confirmation results are as follows.

1st P3710:  D00
2nd P3710:  TS5

↑The value is different, is there any problem?

Also, if there is anything else you need to check, please let me know.

thanks,

@shibata-a would like to double check that you also ran the commands specifying SoC IDs? Thanks.

The cable used on the devkit follows the Open Compute Facebook version of the Mini-SAS spec. Standard Mini-SAS cable would not work as it does not carry some of the sideband signals. Here are the cables from Amphenol which are compatible with the Orin devkit.

Description Mfg P/N
CABLE Assy miniSAS-HD x4 GEN4 500MM Black Male 0° PCI Express to Male 0° PCI Express wire 100ohm UL VW-1 NEDDDF-N904
CABLE Assy miniSAS-HD x4 GEN4 1000MM Black Male 0° PCI Express to Male 0° PCI Express wire 100ohm UL VW-1 NEDDDF-N901
CABLE Assy miniSAS-HD x4 GEN4 3000MM Black Male 0° PCI Express to Male 0° PCI Express wire 100ohm UL VW-1 NEDDDF-N903

Dear @VickNV ,

Unfortunately, the mini-SAS cable with the model number you provided(NEDDDF-N904, NEDDDF-N901, NEDDDF-N903) could not be found on Amphenol’s official site.↓
https://www.amphenol-cs.com/

I’m sorry, but could you please check the model number again?

Also, I understand you need to support sideband signals, are the following products compatible with Orin DevKit?

thanks,

@shibata-a
This cable was designed for the devkit so it may not be advertised as a standard cable on their website. You should reach out to their local Amphenol contact directly to inquire and purchase the cable. Thanks.

1 Like

@shibata-a please share back your experience once your setup works well. Thanks.

Dear @VickNV ,

With the support of the local team, we were able to make it work as expected.

Thank you very much.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.