ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package

Hi. I’ve checked the other answers to this question but haven’t found any that worked. Interestingly, when running this code, everything works just fine:

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
#pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A cat holding a sign that says hello world."

pipe = pipe.to("cuda")

image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator(device="cuda").manual_seed(0)
).images[0]
image.save("cat.png")

but when attempting to run an LLM:

import torch
from transformers import pipeline

messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    chatbot = pipeline("text-generation", model="Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2", model_kwargs={"torch_dtype": torch.bfloat16}, device="cuda")
    chatbot(messages)

I run into this error:

Traceback (most recent call last):
  File "/home/ferros/repos/test_ai/mistralai-7b-instruct.py", line 16, in <module>
    main()
  File "/home/ferros/repos/test_ai/mistralai-7b-instruct.py", line 11, in main
    result = chatbot(messages)
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 267, in __call__
    return super().__call__(Chat(text_inputs), **kwargs)
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1302, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1309, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1209, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 370, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1977, in generate
    synced_gpus = (is_deepspeed_zero3_enabled() or is_fsdp_managed_module(self)) and dist.get_world_size() > 1
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/transformers/integrations/fsdp.py", line 29, in is_fsdp_managed_module
    import torch.distributed.fsdp
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/torch/distributed/fsdp/__init__.py", line 1, in <module>
    from ._flat_param import FlatParameter as FlatParameter
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/torch/distributed/fsdp/_flat_param.py", line 45, in <module>
    from torch.testing._internal.distributed.fake_pg import FakeProcessGroup
  File "/home/ferros/repos/test_ai/venv/lib/python3.10/site-packages/torch/testing/_internal/distributed/fake_pg.py", line 5, in <module>
    from torch._C._distributed_c10d import (
ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package

I have verified that it does exist in my site-packages:

(venv) ferros@ubuntu:~/repos/test_ai$ ls venv/lib/python3.10/site-packages/torch/_C/
_aoti.pyi      _cpu.pyi    _distributed_autograd.pyi  _distributed_rpc.pyi          _functions.pyi  __init__.pyi  _lazy.pyi             _monitor.pyi  _nvtx.pyi  _profiler.pyi           _verbose.pyi
_autograd.pyi  _cudnn.pyi  _distributed_c10d.pyi      _distributed_rpc_testing.pyi  _functorch.pyi  _itt.pyi      _lazy_ts_backend.pyi  _nn.pyi       _onnx.pyi  _VariableFunctions.pyi

and that its using my virtual environment:

(venv) ferros@ubuntu:~/repos/test_ai$ which python
/home/ferros/repos/test_ai/venv/bin/python

I did install torch per nvidia instructions here

My venv has:

(venv) ferros@ubuntu:~/repos/test_ai$ pip freeze
certifi==2024.8.30
charset-normalizer==3.4.0
filelock==3.16.1
fsspec==2024.10.0
huggingface-hub==0.26.2
idna==3.10
Jinja2==3.1.4
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==1.26.4
packaging==24.2
pillow==11.0.0
psutil==6.1.0
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.4.5
sympy==1.13.1
tokenizers==0.20.3
torch @ https://developer.download.nvidia.com/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl#sha256=6f75fd2d2ef840ede1a90dbcf40a5458214bee26cc803fa510cda2e8978d972a
tqdm==4.67.0
transformers==4.46.2
typing_extensions==4.12.2
urllib3==2.2.3
(venv) ferros@ubuntu:~/repos/test_ai$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:14:07_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

Any thoughts to a path forward? I’ve tried a variety of models, the one above, Mistrial. I have blown out my environment and started over, but alas to no affect. Thanks ahead of time for any ideas!

Hi,

Could you try the package in the below link?

http://jetson.webredirect.org/jp6/cu126

Thanks.

Thanks that fixed it, thanks so much. Saved that link as well :)

The solution was

pip install http://jetson.webredirect.org/jp6/cu126/+f/5cf/9ed17e35cb752/torch-2.5.0-cp310-cp310-linux_aarch64.whl#sha256=5cf9ed17e35cb7523812aeda9e7d6353c437048c5a6df1dc6617650333049092

And you should bookmark this site: http://jetson.webredirect.org/jp6/cu126

Hey, I have a solution for this. For example, I can locate the file
/home/jetson/miniconda3/envs/qwen2/lib/python3.8/site-packages/transformers/generation/utils.py
and comment out lines 1976 and 1977:

if synced_gpus is None: 
    synced_gpus = (is_deepspeed_zero3_enabled() or is_fsdp_managed_module(self)) and dist.get_world_size() > 1

These two lines. Anyway, I believe distributed training is not important when using Jetson, so commenting them out doesn’t affect anything at all. Hope this helps with the issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.