Speeding up pointcloud delivery to ros subscriber (kinect data) [SOLVED]

Hi,

I am using the Nvidia Jetson + ROS + Freenect_Launch to access data from the Kinect. I am running into an issue (that I don’t run into on my intel-i7 laptop) where my node who is subscribing to the /camera/depth/points message, cannot ‘receive’ fast enough (for my purpose). I have played with different ways of configuring the call-back function (as well as using TransportHints().tcp_nodelay()), and the best I can do is about 7Hz.

I am not doing any processing of the pointcloud in the callback, I just have the subscriber-callback publishing a basic sensor_msg so I can use rostopic hz /mynode/basicsensormsg to see how fast the callback is occuring (about 7Hz).

Same exact node running on my laptop is full 30Hz. When I do a rostopic hz /camera/depth/points, this is also 30Hz.

I believe the Jetson board is bottle-necking during the transferring of the pointcloud data from the launch-node to my written node. I’m wondering if there is a more ‘efficient’ way of subscribing to such a large portion of data, or if anyone has compiled the freenect_camera driver into their rosnode and could share their experience (I’m moving toward the idea that the pointcloud delivery through ros sensor_msgs is not the right approach, and would rather have a node directly receive from the driver, eliminating needless memory transfer steps).

Any thoughts?

Description of some code I tried:

The callback: void cloud_cb(const sensor_msgs::PointCloud2::ConstPtr& point_cloud) was tried, and this callback defined this way did not have any bottle necks. However, I could not figure out how to use the cloud in a pcl::passthroughfilter() without using ‘pcl::fromROSMsg()’ first. The pcl::fromROSMsg() caused the 7Hz bottleneck once used in the callback function.

The callback: void cloud_cb(const PointCloud::ConstPtr& point_cloud) was used, and this callback defined this way bottlenecks without any additional code. However, I can directly use the cloud ‘point_cloud’ in a pcl::passthrough filter, avoiding the need to use 'pcl::fromROSMsg();

I’m not familiar with the code, but if TCP is involved there is a relationship between MTU and data size which can change latency. How big is the data at the moment of trying to send? What is the MTU setting on both sending and receiving computers? Is it a lot of small sends, or a few large ones (relative to MTU)?

Well, to clarify a little more: It is TCP but is local to the machine (loop-back? I think). I did not write the driver who is sending the data, but it should be transmitting data at 30Hz, where the data is somewhere between 1-2 MB (not exactly sure).

What is MTU?

Maximum transmit unit size (default tends to be 1500 bytes; one can configure for huge frames, or even chop it down lower to something like 256 byte + header size…header waste goes up, but latency goes down if native data sizes are small). During network transmission sometimes data send is delayed because the content isn’t considered large enough yet (and will delay in hopes of more data before send), or the reverse, a chunk of data may need to be broken up and sent in smaller chunks. MTU is the send side’s idea of the maximum chunk size before sending; less than MTU can result in delays. MRU is the receive side equivalent, but MTU is authoritative and MRU is only a hint. Thus MTU has a big impact on the latency depending on how the data is structured.

Running local means it should probably be run as UDP instead, there wouldn’t be any need to possibly delay sending for efficiency, and certainly it wouldn’t have to worry about re-ordering or packet loss on a local loopback unless the data requires absolutely enormous throughput. At 30 Hz it is certainly possible that unneeded TCP delays for efficiency could be avoided. Or perhaps even just using a pipe instead, or shared memory. But that in turn depends on which parts of the software you have control of.

There may be a number of drivers involved in latency, TCP is often the biggest contributor.

It was a bottleneck using the node-subscribe (TCP) data transfer. I converted my ‘node’ to a ‘nodelet’ and loaded it into the nodelet manager which is launched by the freenect_launch launch file. Full 30Hz in the callback (18Hz with some non-optimized processing of the pointcloud).

A link to another forum post on my issue:

That’s very interesting. Could you share the ‘nodelet’ with us?

Sure. I’ll post it with instructions in the next few days.

Way excellent! Looking forward to it.

Follow the instruction, let me know if you need help.