Measuing lag between Isaac sim and ros2

I’m writing a controller that exists in a node on the Ros2 side, which subscribes to joint states and the tf tree, then publishes a joint effort command back to isaac sim to control the robot. The basic pipeline of this works as expected, but it seems potentially like lag in the system is stopping the controller from actually working.

I’m therefore attempting to measure the lag for round trip messages between Isaac sim and Ros2, so I can understand what sort of performance hit this controller is experiencing, or if theres something else going on. I have an action graph that publishes a timestamp to Ros on a keyboard input, then a node in ros that subscribes to this message and immediately publishes that timestamp on another topic. Back in the action graph, I subscribe to that topic, and subtract the current time from the received time in that message. Im using the following action graph to do this:

However, something strange happens. If I insert a time.sleep(x) into the ros time relay node, the lag seems to change proportionally with x. I would expect the lag to be a constant offset. Eg, if x=0, im getting ~0.01seconds, if x=1.0, I get 0.83s, if x=5.0, I get 4.73 seconds… confusing!!

Am I thinking about this in the right way?

1 Like

Very interesting.
I would like to reproduce it at hand, but what is the content of each Script Node? Or could you upload a usd file?

Simply speaking,

  • the pub-sub latency is often around 1-FPS in Isaac SIM
  • your CPU & GPU will influence the FPS
  • if your FPS varied a lot, the pub-sub latency also be impacted

As far as I know, Isaac SIM is an sim_step()-based app.
In each sim step, it might:

  • check ROS pub/sub state
  • … do whatever you want
  • simulate the physics
  • perform the rendering (can be skipped)


hopefully it helps you :)

1 Like

Hi @andrew.gillies

Could you try using the latest version of Isaac Sim (Isaac Sim 4.0.0)? You can also set the publisher and subscriber qos settings (in Isaac Sim) to have a depth of 1 to ensure no additional messages get queued up in the process which may cause additional delays.

Ok very interesting update:

  1. I upgraded to Isaac 4.0. Didnt seem to impact anything.

  2. I failed to mention that I had other action graphs in my stage that also had ros activities. Even though they were disabled in the stage tree, it seems like they are somehow still active. When I deleted those other action graphs to isolate just my ping test, the strange lag artifacts disappeared! Now when I sleep in my ros node, I get the expected constant lag offset.

  3. The lag appears to be pretty consistently 0.01666 seconds, for anyone interested.

  4. I attached the usd file and ros node here for anyone interested in replicating.

import rclpy
from rclpy.node import Node
from builtin_interfaces.msg import Time
from rosgraph_msgs.msg import Clock
import time

class ClockRelayNode(Node):

    def __init__(self):
        super().__init__('clock_relay_node')
        
        # Subscriber
        self.subscription = self.create_subscription(
            Clock,
            'isaac_clock',
            self.clock_callback,
            10)
        
        # Publisher
        self.publisher = self.create_publisher(Clock, 'ros_clock', 10)

    def clock_callback(self, msg):
        relay_msg = Clock()
        relay_msg.clock.sec = msg.clock.sec
        relay_msg.clock.nanosec = msg.clock.nanosec
        self.get_logger().info(f'Published timestamp: {relay_msg.clock}')        
        time.sleep(1.0)
        self.publisher.publish(relay_msg)


def main(args=None):
    rclpy.init(args=args)
    node = ClockRelayNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

1 Like

Zipped usd file for the action graph since I apparently cant directly attach usd files here.
ros2_ping_test.zip (6.2 KB)

@andrew.gillies,

I tried running your example however it seems I would also need to have your other script files time_diff.py and and clock_relay.py. If you could send those over too that would be great!

Yes of course:

time_diff.py:

def setup(db: og.Database):
    pass


def cleanup(db: og.Database):
    pass


def compute(db: og.Database):

    db.outputs.time_difference = db.inputs.current_timestamp - db.inputs.ros_timestamp
    
    return True

clock_relay.py:

def setup(db: og.Database):
    pass


def cleanup(db: og.Database):
    pass


def compute(db: og.Database):

    db.outputs.timestamp_out = db.inputs.timestamp
    
    return True

Your result is very exciting & interesting.

in the default settings of USD file, the physics rate = render rate = 60FPS = 1/60 sec = 0.0167 seconds
So the result of pub-sub latency is reasonable.

If the physics rate is changed, I think we may get difference result


reference:
https://docs.omniverse.nvidia.com/isaacsim/latest/simulation_fundamentals.html#simulation-timeline

Thanks @jaohsu for sending the reference above!

@andrew.gillies I would ask you to take a look at that reference for a better understanding on how we step through each frame. The delay of 0.0167s for a physics rate of 60 fps makes sense since you might be receiving the data n the next frame that is ticked. What is expected is with a x sec delay you should get around (x+0.0167) time delay. As you increase the physics rate, the consistent delay of 0.0167 should shrink since you are ticking the subscriber at higher rates.

The inconsistencies that you are noticing is likely due to the performance of sim dropping slightly and hence the simulation time is slower than wall clock time. Since your delay is based on wall clock on an external node, discrepancies are expected.

To play around with changing the physics rates you can take a look at this tutorial