Recommended long-horizon execution pattern for GR00T N1.6 with ZMQ

Hello @chenjason7026,

Thanks for posting in the Isaac ROS forum!

I’ll address your questions below.

  1. Should we use one high-level prompt for the whole long-horizon task?
    This may work if the checkpoint was trained or fine-tuned with similar whole-task language annotations and demonstrations. GR00T is conditioned on the language prompt provided in the observation, so prompt wording should match the type of instructions used during training/fine-tuning.

  2. Should we use skill-level re-prompting with an external Task Manager / Behavior Tree?
    For explicit multi-stage tasks, this is usually the more controllable deployment pattern. Since the N1.6 policy/ZMQ server is stateless at the task level, an external task manager can track subgoal completion and send the next subgoal prompt through ZMQ.

  3. Does the ZMQ inference server keep task-level memory across requests?
    No. Treat the ZMQ server as a policy inference service, not a task-level planner.
    Each get_action() request should be treated as conditioned on the current observation and the language prompt included in that request. The official Policy API states that the policy is currently stateless, and the ZMQ server forwards get_action requests to
    the policy.

  4. If a skill succeeds before all actions in the returned action chunk are executed, can we discard the remaining chunk and send a new
    request?
    Yes. That is acceptable and is a normal receding-horizon style of deployment.
    If the subgoal is complete, you can discard the unused part of the action chunk and request a new chunk with the next subgoal prompt. Make sure your downstream controller does not keep executing stale queued commands, and make sure the next request uses the latest observation.

Recommended pattern:

  • Use external task logic for long-horizon sequencing.
  • Send one subgoal prompt at a time.
  • Execute only the needed prefix of the returned action chunk.
  • Re-query at skill boundaries or when the scene state changes.
  • Do not rely on the ZMQ server to remember previous subgoals.