It seems that the default PyTorch version installed with the pre-packaged Dockerfile doesn’t like tensor entry assignments.
I used the vanilla Dockerfile from Preview 4, whose image I’ve converted to Singularity SIF file on cluster. I used this Singularity image to run HumanoidAMP training with torch_deterministic=True
.
I got this error message:
File ".../IsaacGymEnvs/isaacgymenvs/tasks/humanoid_amp.py", line 290, in _set_env_state
self._root_states[env_ids, 0:3] = root_pos
RuntimeError: number of dims don't match in permute
With torch_deterministic=False
, I don’t get this error at all and everything runs as expected.
For reference, the node I was allocated uses Tesla P100.
The lazy workaround I’m using is to run pip install --upgrade torch
inside the container, which gets rid of this error. I suspect this is a fixed bug in PyTorch and we simply need to use a newer version of it.
The real fix is to update the default PyTorch version that comes with the Dockerfile. For now one can simply add another line like
RUN pip install --upgrade torch
at the end of the Dockerfile so you don’t have to wait for a new install every time you launch a containerized job.
Would be great to hear from anyone who has run into the same issue. I hope this is helpful!