Trying to port Jetbot RL RuntimeError: normal expects all elements of std >= 0.0

Hi, I’m trying to modify the OmniIsaacGymEnvs Cartpole task of section 9.2 to solve the the Jetbot task of moving toward a goal object from tutorial 9.9.

I’m able to import Jetbot and start the simulation, but it seems after the first step the simulation crashes and I get the error RuntimeError: normal expects all elements of std >= 0.0.

I found this other post with suggestions about debugging this error, but the observations seem fine

E.g. here are the contents of self.obs_buf before the crash

OBSERVATIONS

tensor([[ 2.5362, 0.5362, 0.6240, 1.5362, 0.5362, 0.5362, 0.5362, 0.5362,

0.5362, 0.4545, 0.5362, 0.5362, 0.5362, 3.1362, 0.8362, 0.5862],

[-1.4638, 0.5362, 0.6240, 1.5362, 0.5362, 0.5362, 0.5362, 0.5362,

0.5362, 0.4545, 0.5362, 0.5362, 0.5362, -0.8638, 0.8362, 0.5862]],

device=‘cuda:0’)

And I’ve set the velocities of the articulationview containing the jetbots to 0.

velocities = torch.zeros((self._num_envs, 6))

self._jetbots.set_velocities(velocities)

I also tried printing out arguments in various parts of the stack trace e.g.

File “C:\Users\irvin\AppData\Local\ov\pkg\isaac_sim-2023.1.1/extscache/omni.pip.torch-2_0_1-2.0.2+105.1.wx64/torch-2-0-1\torch\nn\modules\module.py”, line 1504, in _call_impl

return forward_call(*args, **kwargs)

And I am seeing NaNs, but I’m not really sure where they’re coming from or how to figure that out. Any suggestions on where to look or what to try next to get things working would be super helpful, thanks!

I’m having the same exact problem in a totally different simulation. My simulation is based on the Franka Deformable one. Could not find a solution yet :(

I have opened an issue in github:

1 Like

Thanks for the link to the Github issue! I’ve had some luck by adjusting the number of environments and mini batch size. I was trying to test with really small values so that may have been the problem, but not sure why since I’m still new to all this and learning about neural nets and RL etc. I’ll post an update if I learn anything new/have more consistent success.

I’m also not sure why I’m having this problem, but sometimes it does not happen. My guess is that with certain parameters, the simulation never ends up in scenarios that cause the numerical issues.

Hello all! hopefully I can help :)

The error sounds like one that I have seen from pytorch in general

In [1]: import torch

In [2]: torch.normal(1,-1,(5,))

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 1
----> 1 torch.normal(1,-1,(5,))

RuntimeError: normal expects std >= 0.0, but found std -1

but if you have a tensor of elements, the error is ambiguous, and so it changes

In [3]: mu = torch.Tensor([0,0,0])

In [4]: std = torch.Tensor([-1,0,-3])

In [5]: torch.normal(mu, std)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 torch.normal(mu, std)

RuntimeError: normal expects all elements of std >= 0.0

I’m guessing that somewhere in OIGE there is a very confused normal distribution wondering what the heck a negative standard deviation means, and this could be occurring for any number of reasons. I will point the team to this post :)

I did, however, find this other post with the same error

and that seems to come from a configuration failure within rl_games, which is a third party library we do not support…

There are many, many ways to train an model through reinforcement learning. Even for a single algorithm like PPO there are multiple implementations, and even between those implementations, performance is circumstantial. For these and for many other reasons we are working to integrate the RL features of OIGE and other projects into Isaac-Sim! Our goal is to make it as easy as possible for a user to go from a rigged articulation to a trained policy, and this is an incredibly complex and multi-headed problem. We are working diligently to release these features and get them in your hands as quickly as possible. However, I can’t give you a timeline other than “Soon ™”. Sorry!

Please look forward to it :D

It sounds like this error occurs most commonly when NaNs are passed into the policy. I would inspect the values going into the policy first.

Thanks so much for answering and looking into this @mgussert. Keep up the great work!

Can you clarify what you mean by “I would inspect the values going into the policy first.”? This is somewhere inside rl_games? I have a simulation based on the Franka Deformable and Franka Cabinet examples, I have implemented safeguards in get_observations(), pre_physics_step() and is_done() methods to guarantee that there aren’t any NaNs in the tensors, but I still often get: “RuntimeError: normal expects all elements of std >= 0.0”

We are abandoning rl_games for these kinds of issues among other reasons.

It could be a naming issue between one of the elements of the USD / URDF scene that fails, and the default consequence of that also results in this error. It could be the result of how the various distributions are managed and used. It could be the result of a simple missing exception catch within the codebase, etc…

If you are interested in doing RL with isaac-sim I would strongly recommend you checkout our stand alone examples that use SB3. You can find them in your installation under source/standalone_examples/api/omni.isaac.gym/. Our future iterations on RL in isaac-sim will probably use these as a springboard.

If you need any help with that, please post here!

Thanks again @mgussert for taking the time to explain.

I found only the cartpole example under source/standalone_examples/api/omni.isaac.gym/. Are there more SB3 examples/resources? Are there plans to port OIGE’s rl_games examples to SB3?

Thanks!

The plan is to expand the RL capabilities of Isaac-sim in general. I expect much of the work in OIGE will be ported over and / or integrated into isaac-sim :) I can’t say anything definitively though, if only because I don’t want to step on toes XD

Thanks for your answers @mgussert! Interesting to hear rl_games is being left behind, my understanding is that was what allowed isaac sim/gym to have the physics simulation run on the gpu, which sped up training by a lot. I did see stable baselines has vectorized environments, but it wasn’t clear whether this meant you could get the gpu speedup like with rl_games (the existence of this project and some questions on stack overflow/github issues makes me think no? GitHub - MetcalfeTom/stable-baselines3-GPU: A GPU-accelerated fork of stable-baselines. Delivering reliable implementations of reinforcement learning algorithms. or at least it’s not standard). Do you know if the expansion of RL capabilities in Isaac sim include the advantages of Isaac Gym simulating many environments in parallel?

@edsonbffilho for your question of other SB3 examples, not sure if you saw, but the Jetbot task I was trying to convert uses SB3 9.9. Reinforcement Learning using Stable Baselines — Omniverse IsaacSim latest documentation

Also in my case (for trying to use rl_games) when I was looking at the stack trace it seemed to happen when running the model to get the actions rl_games/rl_games/common/a2c_common.py at master · Denys88/rl_games · GitHub. Even though the observations passed in didn’t have NaNs.

I started seeing NaNs in what I’d guess is the forward pass of the model pytorch/torch/nn/modules/module.py at main · pytorch/pytorch · GitHub. But I didn’t understand/find the code where the mean/std deviation is being generated by the neural network for A2C and used to sample the actions where I’m assuming the error is happening (i.e. the forward pass is generating a negative standard deviation and when that is used to sample an action we’re seeing the error above…although I may be completely off on all of this…)

1 Like

Hello irvinh!

The goal is to save all of the good and none of the bad in this integration. The creation of performant GPU based vectorized training environments is a core requirement of RL that we will absolutely support. I am confidant in this. While I don’t have the clout or the authority to make any sort of official guarantee like “We will release this feature with these specs at this time”, I don’t see a path forward without GPU accelerated vectorized training environments. I also feel confident in communicating that this perspective is shared by the powers that be (the people in control of actually designing this integration).

There are many reasons why we are moving away from rl_games, but the biggest one is that we don’t want to produce RL tools that will be tied to specific third party libraries. Rather, we seek to create that software which will motivate the creation of new RL libraries and furthers GPU integration into those that already exist. The major hurdle along this path is managing and exposing the appropriate data on the GPU, which is complicated enough as it is. It gets even trickier when the notion of what a “GPU” is expands to mean “a DGX based data center”.

USD provides us with a universal data interchange format for representing arbitrary 3D scenes, but this generality comes at a cost to performance. We get around this though Fabric and USDRT, which “mirrors” the data on the stage; after all, any data on the stage is coming from the GPU anyway, so this problem reduces to one of associating appropriate addresses on the device. Much of this code is very “low level”, meaning that it involves handling the actual gearing that lets the whole thing run in the first place. Ideally, users wanting to use the simulation for reinforcement learning shouldn’t need deep, intimate knowledge of how the data is managed on the GPU.

Creating a tool set for defining these GPU based vectorized environments therefore means not just exposing fabric and usdrt functionality to the user, but also answering a whole slew of questions though design. How do you manage an arbitrary subset of your environments needing to be reset when they are distributed across multiple DGX machines? What are all the different use cases / RL algorithms available and do we satisfy those use cases at a minimum? How do we handle synchronicity and asynchronicity? How deeply should a user need to be concerned with the “vectorized” nature of the environment? etc…

It’s a huge and complicated problem, but it’s also exciting :D

1 Like

Very cool to hear and looking forward to it! Thanks again for taking the time to answer these questions, sounds very exciting indeed.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.