Cannot install deepspeed on AGX Xavier

For some time now I am trying to get deepspeed running on the AGX Xavier, but I keep running into the same error, regardless of what I try to do.

I use Torch 1.9.0 using the compilation resources found on the forums, and I can load torch through the python console, but when I try to compile & load deepspeed (note: not deepspeech) I am getting the following error:

(venv) user@javier:~/Documents/projects/DeepSpeed$ deepspeed
Traceback (most recent call last):
  File "/home/user/Documents/projects/DeepSpeed/venv/bin/deepspeed", line 3, in <module>
    from deepspeed.launcher.runner import main
  File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/deepspeed/__init__.py", line 12, in <module>
    from .runtime.engine import DeepSpeedEngine
  File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 16, in <module>
    from torch.distributed.distributed_c10d import _get_global_rank
  File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 15, in <module>
    from .constants import default_pg_timeout
  File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/torch/distributed/constants.py", line 1, in <module>
    from torch._C._distributed_c10d import _DEFAULT_PG_TIMEOUT
ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package

And no matter how hard I try, I can’t seem to get rid of that error. Does anyone encountered a similar error and/or knows how to get rid of the error?

Hi,

Just test the v1.9.0 PyTorch package with JetPack 4.6.
It can work correctly as below:

$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch._C._distributed_c10d import _DEFAULT_PG_TIMEOUT
>>>

It seems that you are using python 3.8 for deepspeed.
Please noted that our prebuilt package is for python v3.6 rather than v3.8.

Thanks.

This is why I hate python development hell: The package I need (deepspeed) needs to be built for python 3.8, because it’s a dependency for another package that will introduce its own set of bugs if I try running it on python 3.6. If you are saying that Pytorch breaks when running on python 3.8 but it does not break when running on python 3.6, then it means that the current pytorch code either does not comply with the Python standards, or the Python standards changed between 3.6 and 3.8 and caused code to break, else I would not see these errors when going to a higher version. I am using exactly the method as described here: PyTorch for Jetson

Its not a simple case of “just run python 3.6 and be done with it”, because that means I need to rewrite a lot of code with it’s dependencies just to make it run on an older version of Python with 3.6 being EOL for the end of this year.