What is HydraEngine? How does it relate to Replicator? Why would it fail to end compute graph?

I have seen these Hydra / Replicator errors immediately after starting simulation. In some cases, it will be infinite loop of errors with only recovery option to restart Isaac Sim which is very severe.

image

It seems like there is something that is traversing the stage graph and because of strange prim paths, maybe due to programmer error, it encounters a cycle and computation fails. For example, if we try to Load a scene twice without first saving a file to reload the extension we can reproduce this error reliable in Isaac Sim 4.0.0

Can you explain what HydraEngine is and how it relates to Replicator?

It would help if the computation failure error message gave more information about what caused the computation to fail which would give us clues to prevent it.

What are the likely causes of these types of fatal errors?

Hydra is a runtime computation engine for scene and renderer abstraction. If you are using Replicator, you are likely using Hydra as well. The errors you post are likely caused by previous CUDA errors. If you include the full log file we may be able to provide further help.

The errors you post are likely caused by previous CUDA errors

By “cause” I meant root cause. In other words, looking for a cause that would help us diagnose and prevent the issue.

As mentioned above, when loading the scene, this error occurs. I am asking what types of operations during scene loading such as adding items to the stage, world initialization, action graph setup, etc. can cause this Hydra error to occur.

I expect Loading the scene should not cause CUDA errors or Hydra Engine failure. Even if it did, Isaac Sim should protect against catastrophic failure requiring restart. I think there is some special case based on sequence of events or race condition that our code causes which Isaac Sim does not protect against and this is why it does not occur in other extensions.

If you include the full log file we may be able to provide further help

I have attached a log where you can see the error repeat until I close the application

HydraEngine::render failed to end the compute graph: error code 6

kit_20241025_075728.log (3.1 MB)

Hydra is a runtime computation engine for scene and renderer abstraction.

Can you provide a link to documentation about this so I may read more?
I would like to know what it is attempting to compute and abstract

If you are using Replicator, you are likely using Hydra as well.

We are not directly using Replicator (Although I believe the extension is loaded by default). I can even explicitly disable “Synthetic Data Recorder” and “Capture on Play” and the Hydra error will still occur. To me, this implies that Replicator is not related or dependent for this particular error.

Because of the message

Invalid USD RenderProduct Prim: /Render/OmniverseKit/HydraTextures/Replicator

I am suspicious I it related to ActionGraph and Cameras which use a RenderProduct node to generate frames from camera. The strange thing is that the explicit node created by our code do not use this prim path. Maybe it is part of the abstraction you mention that is done automatically.

I am hoping it can be disabled through configuration

The logs were provided.

Can you confirm that you are investigating the logs?
Is there any status update on the cause of error produced by scene loading?

Hey there @mattmazola,

I am also facing the exact same issue, where if I were to load a scene from a custom extension more than once (using the UI’s Load button`), i get “Invalid USD RenderProduct Prim: /Render/OmniverseKit/HydraTextures/Replicator” and “HydraEngine::render failed to end the compute graph: error code 6” errors.

Just like you, if I reload my custom extension before loading a scene again (e.g. by saving the extension source code and hot reloading it), this issue does not occur.

I am still in the middle of investigating what part of my own code could be causing this, so I don’t have much information to share right now, but I just wanted you to know that someone else is facing the exact same problem. If I figure anything out I’ll make sure to share it here with you too.

I just wanted you to know that someone else is facing the exact same problem

Thank you!

Hopefully the NVidia moderator can respond with something useful to prevent or at least explain why it occurs.

Given the severity of the effects from the error I expected a bit more attention. Perhaps it is a much tricker / complex issue to diagnose than it seems.

I may investigate the LoadButton source and see if there is anything that looks related to Hydra or RenderProduct.

Hmm, I don’t see anything too suspicious in this LoadButton code.

# Remove any previous World instance
prev_world = World.instance()
if prev_world is not None:
    prev_world.clear_all_callbacks()
    prev_world.clear_instance()
    prev_world = None
    # prev_world.clear()
await update_stage_async()

# Create a new World instance with user-defined settings.  See self.set_world_settings()
world = World(**self._world_settings)

# Call user function to put assets on the stage and add them to the World
if self.setup_scene_fn is not None:
    self.setup_scene_fn()

await world.initialize_simulation_context_async()

await world.reset_async()
await update_stage_async()
await world.pause_async()

# User assets are now initialized, and the timeline is playing at timestep 0
if self.setup_post_load_fn is not None:
    self.setup_post_load_fn()

I suspect that there could be code executed that isn’t visible through the omni source. For example, the Isaac Sim application may be reacting to certain events such as world initialization, or timeline events, that are based on the set of extensions loaded and configured.

await world.initialize_simulation_context_async()
await world.reset_async()
   ...
   omni.physics.tensors.create_simulation_view(self.backend)

These functions would be the most suspicious although I think it’s better if we get assistance from those who have more understanding rather than speculating with limited knowledge and access.

Hi, thanks for your patience. We got your log file and we are reviewing it with the engineering team. Will get back to you shortly. Thanks for posting this.

Hi there,

would it be possible to get a code snippet with the issue for testing?

From the logs it seems the simulation is using the camera sensor which is built using replicator created render products (hydra textures). It might be that the camera is reset / not reinitialized, or something similar, where the render product gets removed from stage and the camera still tries to use it.

Cheers,
Andrei

Will get back to you shortly

Given it’s been a few months, can you provide a status update?
I’m hoping there is new information learned by the engineering team’s investigation.

would it be possible to get a code snippet with the issue for testing?

No. I was also hoping to get this information.
If we knew the snipped that isolated the cause of issue, then we likely could have fixed or prevented it ourselves and wouldn’t have needed to request help diagnosing.

camera sensor which is built using replicator created render products (hydra textures) … gets removed from stage and the camera still tries to use it

Hmm. I don’t know enough to determine why it would be removed from the stage. It seems like some internal Isaac Sim code.

We are currently putting adding code to prevent loading the scene twice.

Thank you for following up on Andrei @ahaidu’s post. It is very likely your errors are related to an issue with Isaac Sim’s setup, or to a stage that contains elements of previously deleted items. If you could try this with a clean installation of Isaac Sim 4.2 and let us know you still see these errors, it would be great. Also please submit the latest kit log file you get, and steps we can follow to repro. Thanks.

Well, I was hoping for more information. I am guessing the investigation by the engineering team either didn’t happen or they did not find anything conclusive.

This is a large and disruptive request.

The Hydra error has replicated across 4 other co-workers’ machines.
I have uninstalled and reinstalled Isaac Sim previously and the error has same reproduction process.
Because of this I think uninstalling and re-installing one more time is not going to cost me many hours of lost productivity for very little benefit.

I do not plan to uninstall and install again. although if I run into another blocking issue which requires it, I will do it again and report back.

latest kit log file you get, and steps we can follow to repro

The repro steps have not changed and there is a log file above.