How to transmit data from a computer to Jetson TX1 as fast as possible?

Dear developers,

Except for using a Ethernet cable (by ssh), could you please suggest other fast way of transmitting data from a computer to Jetson TX1?

Thank you very much for your reply.

Sincerely

Luyuan

Is the issue for ethernet speed being too slow, or something else?

Depends on the nature of the data, but you could use USB3. How easy or difficult that may be depends a lot on the type of data and what programs might use the data (meaning whether a driver has to be written, or if pre-existing drivers can be used).

@linuxdev

Is the issue for ethernet speed being too slow, or something else?
Try to find a way of transmitting data faster than Gigabit Ethernet.

Depends on the nature of the data, but you could use USB3.
Could you please explain “depends on the nature of data” in a bit detail? Thank you very much.

meaning whether a driver has to be written, or if pre-existing drivers can be used.
It will be great if you could recommend a suitable USB 3.0 cable, then whether a driver is needed or not can be decided. Thanks a lot.

Consider ethernet as having a device driver that is symmetric at both ends, and that driver does nothing but implement network protocols. The driver does not need any special information on the specific devices, it just needs to pass that data along to user space programs which know what they want.

USB is a data pipe that needs a driver specific to the device…there is always a device with device behavior as a driver in the kernel and always a host with knowledge of that device as a driver in the kernel (these kernel drivers have knowledge specific to the device…the kernel must understand the device and not just the data passing). There are generic classes of USB devices which are something like a template, including mass storage, human interface devices, and streaming (isochronous) devices…should you use a predefined interface type there is a framework in place to do all of the things that kind of device is defined to do (you still have to write the glue to bind the predefined template to your specific device).

If you can make your device behave exactly like a keyboard or mouse (human interface device class), then you can just identify your device to be such and the driver for that specific class will take over. If you can make your device behave like a hard drive (mass storage device), then you can start and stop large amounts of data at will…but you’d have available only the predefined things the hard drive class can do. Having a predefined class match what you want to do means you only have to write the glue and not the whole USB driver.

As soon as your device needs to do something that predefined classes do not exist for you need to write the entire driver yourself. For example, if you have a video camera, and it does most of what you want with standard commands for that USB Video Class (UVC), but want to customize it for a bit of extra hardware (perhaps this camera has legs and walks) you have to write the entire class yourself. You can borrow from the UVC for parts that match, which is helpful, but you’re still writing it yourself. Typically such a device would instead choose to make multiple USB devices and write the extensions as a custom driver but keep the standard parts standard…it’d be two independent USB devices in one physical package.

What you need to do to work with USB will depend on what kind of interface the data and device needs…so the question is whether the device and data behavior can be treated as something that already exists? You’ll have to describe the programs and data in order to compare it to what’s already out there.

FYI, you could write a driver which behaves like a network device at both ends, and implements network protocols…then USB could be used with standard network commands and any program equipped for networking would not know the pipe is USB instead of ethernet (and there is code out there for people wanting to do this). This would be faster than ethernet but remain generic for anything networked. I suspect that buying something 10Gbase-T would be far easier, though you might still end up with driver issues on ARMv8-a (hardly anyone has tested or debugged such drivers on this architecture).

@linuxdev

Thanks for your detailed answer.

I am going to use USB 3.0 to transmit data between a computer and Jetson TX1 (bidirectional transmission, half duplex).

The computer is running Linux Ubuntu, the Jetson TX1 is also running ubuntu and has a a PCIe x 4 USB 3.0 Card(PCI-Express USB 3.0 - Integrate Computer.de) with additional power supply (As described in the question “Questions about transmiting data from computer to Jetson TX1 using USB 3.0”).

Is this possible to achieve this goal with USB 3.0 crossover/bridge cable and “usbnet” (or other ways to use USB with standard network commands)? If yes, could you please provide more hints? Thanks.

Even now with USB3 one end of the connection is a host, the other is a device (which implies the USB connector counts host as in control, the other as a device/slave to be manipulated by the host). Ethernet has no host/device distinction, the protocols pass such distinctions through to the programs in user space making use of the connection.

Up through USB2 cables forced half duplex because there was only one D+/D- pair…I haven’t looked at how USB3 with its dual D+/D- pair changes things, but I believe both data pairs could probably be active at once (this may require USB3.1, Jetson is up to USB3.0). Regardless of wires, one end is still a host and one end is still a device, and devices need to be set up to tell a host what they are (you will probably end up writing a custom driver on the Jetson to turn it into a device of the type you want). Whether the host needs a custom driver depends on the type of device the Jetson replies with when queried. Do you want to pretend your device is a hard drive? If so, the gadget interface will make that easy. If not some preexisting generic interface type, then you have a learning curve ahead (you still have not mentioned the nature of the data transfers, e.g., real time, continuous, large batches which stop and start, so on).

The case of software to make USB look like a network card still has a device response, it just claims to be a network interface instead of some other device type. The host side driver would have to accept network communications, and for example, create a device special file emulating something like what “/dev/eth0” would be (if it is the second network interface perhaps it will show up as “/dev/eth1” when a host has a cable plugged in to it). The advantage here is that there may be existing software you can use to drop in on the Jetson and host to emulate a network card, but I have not used this myself so I don’t have any advice on that. The advantage is that your programs which use networking now, but find networking too slow, would not need any modification (you’d just set up networking with route and netmask, so on, as desired).

Putting in a real 10Gbit network card would be faster yet, and if the drivers work on the Jetson end, there wouldn’t be any real effort involved beyond adding a kernel module. That and money.

@linuxdev,

Thanks for your detailed answer.

I am going to use Ethernet, because it is symmetric at both ends. In this case, I could connect the Jetson TX1 using ssh. Then I can transfer files from my computer to Jetson TX1 with scp (using libssh library), but scp is a bit slow. Could you please suggest me some other protocols that be written in C/C++ ? Thank you very much. (I have also checked sftp, which is of similar speed to scp)

The first thing to note is that scp and ssh use encryption. This is probably the major contributor of work load to CPU on the network end of things. Do you need security? If your end format for data transfer is a file, and if you do not need security, then ordinary ftp would be faster than encrypted variants like scp or sftp. A real answer requires knowing more about the nature of the data…you’re using files now, but knowledge of whether the data must be from files or if the file stage could be skipped is mandatory to knowing what is best.

Note that whenever ssh or similar non-symmetric encryption run that the initial setup is where most of the computing power is required; once this is done the actual data transfer is fairly low overhead. If you have one huge file, then you run that encryption setup only once, and then the transfer is fast; as soon as you start opening and closing for lots of small files the overhead goes up tremendously. Again, knowing about whether files are really needed, or if data could be transferred directly without using files would help. Even if you must have files, if you have a lot of files a single program could be used with “cat”, the data streamed, and then the files reconstructed to reduce the start/stop overhead of setting up encryption.

Incidentally, opening and closing files is very high overhead even without encryption. Rearranging multiple files into a single file when many files are involved can be a big boost in performance. One example is to use tar on a group of files prior to transfer. This does not eliminate the time to open and close files since creating a tar archive must do this…but it does offload the work to open and close to become separate from the file transfer operation. Do you have a lot of small files, or a few big files? Are the files created and immediately sent, or is there a time delay where you could tar and compress prior to send?

Network programming is itself fairly easy if you have some specific communications between two programs. The question is whether you really want to copy files, or if files are just a byproduct or intermediate way to package the data and transfer it? Eliminating files entirely and directly streaming across the network to eliminate hard drive limitations could be an enormous boost. Unless you have a very fast RAID array (and this is even if you use SSDs) gigabit will outperform what a hard drive can deliver. When transferring files you’re limited by the hard drives on both ends. More information on the nature of the data is important if you want to speed it up…perhaps you don’t need files at all. If files are arranged as one large transfer, then tuning the network to use jumbo frames at each end could also be a boost without ever doing any programming.

Many years ago I worked on back-end processing for very large amounts of financial data (individual files sometimes exceeding 2GB, total storage approaching TB levels). One program processed several extremely large files and opened and closed the file for each of several operations on the file. After profiling and finding the open/close time to be the problem I rearranged this to be a single open and it no longer closed after each operation. This program had been taking two to three days to run…avoiding the open/close reduced time to two minutes. The files were on an extraordinarily fast RAID array (even now you’d be in the ball park of twenty or thirty thousand dollars for an array like that)…hardware solutions are a waste if hardware is not the bottleneck.

If you need to transfer data over a network, and if that data does not need to be secure, then programming to use ordinary TCP sockets is trivial and could save overhead…this won’t help much though if the network is not the bottleneck.

connecting the TX1 ethernet port directly (or through good a GB switch) to a PC AND then setting each device to allow jumbo frames results in iperf3 benchmarks over 1Gbps. To set jumbo frames use ifconfig eth0 mtu 8000 on the TX1 and connected PC. This is about 15% better than using the standard MTU size. Minimally tested, use at your own risk. The network between TX1 and PC must support jumbo frames. To prototype you can use netcat (nc) on both ends and just read/write stdin/stdout.

@linuxdev,

Thanks for the detailed reply. I plan trying to transmit an array of floating numbers from computer to Jetson TX1 (connected through an Ethernet cable) using TCP sockets. I hope it will be faster than storing these numbers in file and use scp.

Using TCP directly, and avoiding hard drive operations, speed should be several orders of magnitude faster (of course if you are going over the internet for long distances could imply internet failure is an issue, but on a local network you will probably be very happy with the increased speed).

@linuxdev,

Thanks. If I need to transfer large files (.txt or .csv), could you please recommend a fast way?

Is this over a local network with no security issues? If not, then you’re back to scp/sftp. If things are secure, and this is already in the form of a file (and thus not possible to transmit directly from the producing application without an intermediate file anyway), then you could put use an ordinary ftp server (remember, this isn’t something you’d want to expose to the wild of the internet).

Netcat was mentioned, and if you are using two machines on the same local network without security problems, this might be a candidate. Normally netcat would be used for testing and benchmarking, but it can work fairly well from a script as well (it’s like “cat”, but over a network…see “man netcat” or “man ncat”, it’s a pretty basic tool, and thus very little overhead). You’d pick tcp and probably IPv4, and that’s about all there is to it.

File format itself is important. Text and csv files are very compressible, but it takes CPU to do compression and decompression. If the files are created and instantly sent you might use a faster compression method (e.g., “bzip -1”), but if you have time where files accumulate and then at some later point trigger a transfer, then you might spend time compressing more prior to sending (e.g., “bzip -9” or 7z). I believe the Jetson is quite able to decompress max compression without much issue; should you go to create a compressed file CPU might make more difference and your desktop computer might be faster at compressing. Taking the time ahead of time to create a tar archive and compress that would give you better compression, but might not be worthwhile if individual files are large…the best advantage from tar would be if you have lots of small files.

There are your two basic criteria…use encryption if insecure, else use something simple like ftp or netcat (ncat). If files have time to be prepared first, then compress them as much as possible; if not, still compress them, but use fast and cheap compression. Consolidating files via gains usefulness for lots of small files and becomes less important for large files.

Avoid using the hard drive at all for live data.

@ sperok,

Thanks for your detailed answer. I am going to change the mtu size to perform the measurement and perform netcat.

I have a computer and Jetson TX1 connected with a Gigabit Ethernet Cable.
With mtu 1500, I have tried scp on terminal to transmit a file of 2GB from the computer to Jetson TX1, it takes long time (sorry for not remember clearly). (2GB/transmission_time) is between 50MB/s and 60MB/s. This value is far less that 125MB/s.

Using mtu 8000 and netcat, could you please tell me how much time do you take if you transfer a file of 1GB from computer to Jetson TX1? Thank you very much.

Do remember that speed will only be as fast as the slowest part in the chain…which would be the hard drive (50 to 60 MB/s probably was limited more by the hard drive than by the network). If you really want an interesting performance check for the network, mount a partition read-only, and then export it as NFS. If you have sufficient ram, then after the first read of data the data will be cached; the second run would be far faster than the first run because the data would mostly be from ram instead of disk. This is of course good only for read-only data which doesn’t change and which gets transferred more than once, but for benchmark purposes this gets rather interesting.

Long ago I wrote a management app for Beowulf cluster installation specialized in diskless slave nodes. Similar disk-based nodes took 18 minutes to start, the diskless version, after BIOS, took 25 seconds. Other than serving entirely over the network instead of disks at each node the disk and diskless clusters used the same hardware and software.

Also, the faster your data can be transferred (meaning less restricted from disk throughput) the more the higher MTU and jumbo frames will help.

My results are consistent with yours, unfortunately. With a 3.4GB file (system.img from the jetpack distribution) and an MTU of 8000 file transfers topped out between 60-70MB/sec.

Using the iperf3 benchmark (sudo apt-get install iperf3) raw TCP performance is ~1Gbps, see the output in my subsequent response.

The suggestion from @linuxdev is a very good one - compressing/decompressing may make sense if your data is compressible and compression overhead << transmission time.

Finally - take a look at mbuffer. It can be used as a replacement for nc and provides a slight bump in speed, dropping the time required to transfer the 3.4GB system.img file down to about 48 seconds at an average speed of 67MB/sec

Here is the output from iperf3 demonstrating receiving data at 1Gbps (112-120MB/s)

ubuntu@tegra-ubuntu:/mnt/sd$ sudo iperf3 -p 3000 -s
-----------------------------------------------------------
Server listening on 3000
-----------------------------------------------------------
Accepted connection from 192.168.2.91, port 48272
[  5] local 192.168.2.172 port 3000 connected to 192.168.2.91 port 48274
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  95.7 MBytes   801 Mbits/sec                  
[  5]   1.00-2.00   sec   114 MBytes   953 Mbits/sec                  
[  5]   2.00-3.00   sec   114 MBytes   953 Mbits/sec                  
[  5]   3.00-4.00   sec   116 MBytes   968 Mbits/sec                  
[  5]   4.00-5.00   sec   116 MBytes   976 Mbits/sec                  
[  5]   5.00-6.01   sec   120 MBytes   997 Mbits/sec                  
[  5]   6.01-7.00   sec   112 MBytes   947 Mbits/sec                  
[  5]   7.00-8.00   sec   114 MBytes   954 Mbits/sec                  
[  5]   8.00-9.00   sec   118 MBytes   991 Mbits/sec                  
[  5]   9.00-10.02  sec   115 MBytes   951 Mbits/sec                  
[  5]  10.02-11.00  sec   112 MBytes   954 Mbits/sec                  
[  5]  11.00-12.00  sec   114 MBytes   953 Mbits/sec                  
[  5]  12.00-13.00  sec   114 MBytes   954 Mbits/sec                  
[  5]  13.00-14.00  sec   114 MBytes   954 Mbits/sec                  
[  5]  14.00-15.00  sec   115 MBytes   964 Mbits/sec                  
[  5]  15.00-16.00  sec   114 MBytes   954 Mbits/sec                  
[  5]  16.00-17.00  sec   114 MBytes   954 Mbits/sec                  
[  5]  17.00-18.00  sec   114 MBytes   954 Mbits/sec                  
[  5]  18.00-19.00  sec   115 MBytes   962 Mbits/sec                  
[  5]  19.00-20.00  sec   114 MBytes   955 Mbits/sec                  
[  5]  20.00-21.00  sec   114 MBytes   954 Mbits/sec                  
^C[  5]  21.00-21.94  sec   107 MBytes   953 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-21.94  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-21.94  sec  2.43 GBytes   953 Mbits/sec                  receiver

@sperok,

Thanks for your detailed answer. I got 70.28MB/s after changing mtu to 8000 with the tool netcat.

@linuxdev,

Thanks for your detailed answer, you are right. The speed is limited more by the hardware.

If the .txt and .csv files are sent over USB 3.0 port on the PCIe X 4 USB 3.0 card, or these files are sent over PCIe x 4 10Gigabit Ethernet, will the speed also be limited by the hardware?

Correct, the speed would not increase on average because the file read/write on the disk is the limiting factor. A given packet of data would burst faster (cache ram gets used from the hard drive), but because you can’t continuously feed data to fill a packet as fast as the packet is sent averages will not even remotely approach the possible throughput speed of USB3 or 10Gbase-T. Even ordinary gigabit would require something like a RAID-0 or RAID-10 array with many disks (RAID-1 would increase read speed, but a pair of mirrored disks won’t even come close to continuous gigabit) before fully utilizing the data pipe.

When using slower data devices the advantage of a much faster SATA or PCIe or USB3 type data pipe only shows up when there are many devices all wanting to compete for resources at the same time. When burst speed is very fast it is unlikely two devices will conflict with one waiting for the other to first complete. Each individual device will not have a practical speed advantage. This is why eliminating the hard drive and directly networking data transfer is such an important change.

Note that there is still use for faster data pipes even if averages do not improve. Drivers potentially spend less time servicing a request which leaves more time for other drivers…system latency can improve. This just won’t help your particular case.