For some time now I am trying to get deepspeed running on the AGX Xavier, but I keep running into the same error, regardless of what I try to do.
I use Torch 1.9.0 using the compilation resources found on the forums, and I can load torch through the python console, but when I try to compile & load deepspeed (note: not deepspeech) I am getting the following error:
(venv) user@javier:~/Documents/projects/DeepSpeed$ deepspeed
Traceback (most recent call last):
File "/home/user/Documents/projects/DeepSpeed/venv/bin/deepspeed", line 3, in <module>
from deepspeed.launcher.runner import main
File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/deepspeed/__init__.py", line 12, in <module>
from .runtime.engine import DeepSpeedEngine
File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 16, in <module>
from torch.distributed.distributed_c10d import _get_global_rank
File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 15, in <module>
from .constants import default_pg_timeout
File "/home/user/Documents/projects/DeepSpeed/venv/lib/python3.8/site-packages/torch/distributed/constants.py", line 1, in <module>
from torch._C._distributed_c10d import _DEFAULT_PG_TIMEOUT
ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package
And no matter how hard I try, I can’t seem to get rid of that error. Does anyone encountered a similar error and/or knows how to get rid of the error?