From what I could see, the PCIe switch did have the correct configuration (usecase4.0) and an appropriate firmware (IMG version 1.08 BD58) already installed.
The connection works, but is very slow. I used the tool iperf to benchmark the connection and it did report a speed of about 380Mbit/s. I would have expected a much higher speed.
iperf reports 940Mbit/s for the 1GbE connection over the internal Ethernet switch and about 6000Mbit/s for the external 10GbE connectors when sending data between XavierA and XavierB. These numbers seem legit do me, so it seems the benchmarking tool does its job.
Do others also see a low speed on the PCIe Interface Communication too or are there some settings that can increase the speed?
As stated in my previous post, it is running Drive9, i. e. Drive Software 9.0. I assume that means I am running Drive OS 5.1.0, or are there any other versions around for Drive Software 9.0? I is there actually a way to query the Drive OS version?
I was asking because I couldn’t find the firmwares (switchtec_pfx.pmc and usecase4.0.pmc) in DRIVE Software 10.0. Could you let me know where you get them in DRIVE Software 9.0? Thanks!
The PCIe switch firmware and various configurations do not come with the Drive Software installation but have to be downloaded separately by running the script restricted-pdk.run from https://developer.nvidia.com/drive/secure/restricted-pdk.run
However, I did not use any of those files, since after evaluating the installation on the DriveAGX Pegasus using switchtec-user from https://github.com/Microsemi/switchtec-user.git as well as the output of lspci, everything did seem to be set up the way it should be, according to the description in Drive OS documentation. Therefore I did not flash the PCIe switch frimware or configuration myself.
So in what mode should it run to get higher performance and how can that be changed?
And how come the 10GbE gives a much higher bandwidth even though, according to the schematics in the documentation it runs over the same PCIe switch, just adding an additional detour over the two 10GbE controllers?
The connection works, but is very slow. I used the tool iperf to benchmark the connection and it did report a speed of about 380Mbit/s. I would have expected a much higher speed.
Could you share your commands of getting the 380 Mbit/s result? Thanks!
on the other Xavier, where the ip-address depends what is selected during the setup. For example, when following the Nvidia guidelines, it would be 192.168.1.11 when using XavierA as server (-s flag) and XavierB as client (-c flag) and 192.168.1.12 when using them vise versa.
Which Xavier is used as server and which as client, does not affect the transfer speed.
A transfer-rate of 940 Mbit/s is still much lower than what I would expect from the PCIe connection, which afaik has a theoretical bandwidth of over 31 Gbit/s.
Furthermore, the values you measured look remarkably similar to what I measured on the 1GbE connection. Also, the IP Address 192.168.1.203 is one of the addresses used for a virtual network interface on the eth0 (i. e. the 1GbE connection) in the default setup for the Hyperion Developer Kit. Can you verify, that the connection you tested is actually using the PCIe Interface Communication and not the 1GbE connection? Can you maybe share the output of ifconfig?
As we can see here, none of the eth1 adapters does have the IP address 192.168.1.203 whereas this IP address is used on the sending side in the iperf test. It is most likely associated with eth0:400. Can you either use a different IP address range for the eth1 adapters or remove all other adapters that have an IP address in 192.168.1.x and re-run the test to confirm it is actually using the PCIe Interface connection?
I upgraded my DriveAGX to Drive Software 10 and now I also measure transfer rates of around 2.3 Gbit/s over the PCIe Interface connection. While this is a big improvement over the transfer rate using Drive Software 9, it is still slower than the 10GbE connection and slower than what I expected.
So is this the kind of speed we can expect from this connection and what are the reasons why it is below 10% of the theoretical bandwidth? Is there anything one can change to increase the speed or are there plans by NVIDIA to make changes in future updates? And does anybody know if the bandwidth of the Crosslink NTB conneciton over PCIe between two DriveAGX Pegasus will give similar performance?
At the observed transfer rates, there is pretty much no reason to use this connection.
I understand that the current software stack is not designed for benchmarking. However, the knowledge of what interconnections are available and a rough overview of their provided bandwidth is often important for the application design. That’s why I am wondering, if the observed performance is what one can expect from the PCIe Interface Communication or if there is something wrong with either my setup or the current software version employed.
I did not investigate this issue any further as it seems that is the performance you can expect with the current software stack.
To use the 10Gbit network interface for communication between Xavier A and B, I simply connected the two 10GBit RJ45 connectors with a cable. The system is already set up with fixed IP addresses on this connection (192.168.0.10 and 192.168.0.11 on Xavier A and B, respectively) so no configuration is needed on the software side.