Speeding up pointcloud delivery to ros subscriber (kinect data) [SOLVED]

dwyer2bp · June 23, 2015, 9:39pm

Hi,

I am using the Nvidia Jetson + ROS + Freenect_Launch to access data from the Kinect. I am running into an issue (that I don’t run into on my intel-i7 laptop) where my node who is subscribing to the /camera/depth/points message, cannot ‘receive’ fast enough (for my purpose). I have played with different ways of configuring the call-back function (as well as using TransportHints().tcp_nodelay()), and the best I can do is about 7Hz.

I am not doing any processing of the pointcloud in the callback, I just have the subscriber-callback publishing a basic sensor_msg so I can use rostopic hz /mynode/basicsensormsg to see how fast the callback is occuring (about 7Hz).

Same exact node running on my laptop is full 30Hz. When I do a rostopic hz /camera/depth/points, this is also 30Hz.

I believe the Jetson board is bottle-necking during the transferring of the pointcloud data from the launch-node to my written node. I’m wondering if there is a more ‘efficient’ way of subscribing to such a large portion of data, or if anyone has compiled the freenect_camera driver into their rosnode and could share their experience (I’m moving toward the idea that the pointcloud delivery through ros sensor_msgs is not the right approach, and would rather have a node directly receive from the driver, eliminating needless memory transfer steps).

Any thoughts?

Description of some code I tried:

The callback: void cloud_cb(const sensor_msgs::PointCloud2::ConstPtr& point_cloud) was tried, and this callback defined this way did not have any bottle necks. However, I could not figure out how to use the cloud in a pcl::passthroughfilter() without using ‘pcl::fromROSMsg()’ first. The pcl::fromROSMsg() caused the 7Hz bottleneck once used in the callback function.

The callback: void cloud_cb(const PointCloud::ConstPtr& point_cloud) was used, and this callback defined this way bottlenecks without any additional code. However, I can directly use the cloud ‘point_cloud’ in a pcl::passthrough filter, avoiding the need to use 'pcl::fromROSMsg();

linuxdev · June 23, 2015, 9:49pm

I’m not familiar with the code, but if TCP is involved there is a relationship between MTU and data size which can change latency. How big is the data at the moment of trying to send? What is the MTU setting on both sending and receiving computers? Is it a lot of small sends, or a few large ones (relative to MTU)?

dwyer2bp · June 23, 2015, 9:54pm

Well, to clarify a little more: It is TCP but is local to the machine (loop-back? I think). I did not write the driver who is sending the data, but it should be transmitting data at 30Hz, where the data is somewhere between 1-2 MB (not exactly sure).

What is MTU?

linuxdev · June 23, 2015, 10:53pm

Maximum transmit unit size (default tends to be 1500 bytes; one can configure for huge frames, or even chop it down lower to something like 256 byte + header size…header waste goes up, but latency goes down if native data sizes are small). During network transmission sometimes data send is delayed because the content isn’t considered large enough yet (and will delay in hopes of more data before send), or the reverse, a chunk of data may need to be broken up and sent in smaller chunks. MTU is the send side’s idea of the maximum chunk size before sending; less than MTU can result in delays. MRU is the receive side equivalent, but MTU is authoritative and MRU is only a hint. Thus MTU has a big impact on the latency depending on how the data is structured.

Running local means it should probably be run as UDP instead, there wouldn’t be any need to possibly delay sending for efficiency, and certainly it wouldn’t have to worry about re-ordering or packet loss on a local loopback unless the data requires absolutely enormous throughput. At 30 Hz it is certainly possible that unneeded TCP delays for efficiency could be avoided. Or perhaps even just using a pipe instead, or shared memory. But that in turn depends on which parts of the software you have control of.

There may be a number of drivers involved in latency, TCP is often the biggest contributor.

dwyer2bp · June 26, 2015, 4:00pm

It was a bottleneck using the node-subscribe (TCP) data transfer. I converted my ‘node’ to a ‘nodelet’ and loaded it into the nodelet manager which is launched by the freenect_launch launch file. Full 30Hz in the callback (18Hz with some non-optimized processing of the pointcloud).

A link to another forum post on my issue:

Kangalow · June 26, 2015, 5:37pm

That’s very interesting. Could you share the ‘nodelet’ with us?

dwyer2bp · June 28, 2015, 3:48am

Sure. I’ll post it with instructions in the next few days.

Kangalow · June 28, 2015, 3:58am

Way excellent! Looking forward to it.

dwyer2bp · June 29, 2015, 8:54pm

github.com

johnnyonthespot/ros/blob/master/src/kinect_topdownview/kinect_topdownview.md

#README

This small project is an example of how to write a nodelet to receive kinect pointclouds for processing
along with how to launch the nodelet into a nodelet manager who has the kinect driver's nodelet also operating in it.
My example nodelet subscribes to "/camera/depth/points", runs a PCL-Passthrough filter, and outputs a sensor_msgs/image
to show a "top down" view of just a slice of kinect data points. The code is not perfect, but runs fine on my laptop and 
the NVidia Jetson. If you run on an NVidia Jetson, you will need to use this nodelet method because using a node will bottleneck
due to the TCP data transfer that has to happen (practically un-noticable on a laptop i7 processor).

To use:
  1. Download the "/kinect_topdownview" directory into your ros/src/ directory
  2. cd to your ros workspace
  3. run 'catkin_make --pkg kinect_topdownview'
  4. As long as this builds, move on to "to Run"
  
To run:
  1. Run your favorite 'launch' file for the kinect driver in a terminal (assuming either openni_launch openni.launch or freenect_launch freenect.launch)
  2. Open another terminal, and type 'roslaunch kinect_topdownview kinect_topdownview.launch'
  3. Open another terminal, and run rqt by typing 'rqt'. Open up an image viewer (Plugins/Visualization/Image_View)
  4. Find and select the kinect topdown image ("/camera/kinect_topdown_img")

This file has been truncated. show original

Follow the instruction, let me know if you need help.