Preview 4 Docker/Singularity image: torch_determinism=True causes "RuntimeError: number of dims don't match in permute"

It seems that the default PyTorch version installed with the pre-packaged Dockerfile doesn’t like tensor entry assignments.

I used the vanilla Dockerfile from Preview 4, whose image I’ve converted to Singularity SIF file on cluster. I used this Singularity image to run HumanoidAMP training with torch_deterministic=True.

I got this error message:

File ".../IsaacGymEnvs/isaacgymenvs/tasks/humanoid_amp.py", line 290, in _set_env_state
    self._root_states[env_ids, 0:3] = root_pos
RuntimeError: number of dims don't match in permute

With torch_deterministic=False, I don’t get this error at all and everything runs as expected.

For reference, the node I was allocated uses Tesla P100.

The lazy workaround I’m using is to run pip install --upgrade torch inside the container, which gets rid of this error. I suspect this is a fixed bug in PyTorch and we simply need to use a newer version of it.

The real fix is to update the default PyTorch version that comes with the Dockerfile. For now one can simply add another line like

RUN pip install --upgrade torch

at the end of the Dockerfile so you don’t have to wait for a new install every time you launch a containerized job.

Would be great to hear from anyone who has run into the same issue. I hope this is helpful!