I’m encountering a runtime error while executing a Python script (sqlcoder.py
) on the Jetson AGX Orin device(Jetpack 5.1 ). The issue appears to be related to the current PyTorch installation.
When running the script, I receive the following traceback:
ModuleNotFoundError: No module named ‘torch._C._distributed_c10d’; ‘torch._C’ is not a package
This suggests that the installed PyTorch package might be broken or missing key components required for distributed computing support (like torch.distributed
). It’s likely that the wheel file used for installation is either incomplete or incompatible with the Jetson’s environment.
Steps already taken:
- CUDA is installed and properly configured (
nvcc
works). - Tried every comapatible version of pytorch available on pytorch wheel
System config
Jetpack 5.1
PYTHON 3.8
Below is the error encontered
digitalway@digitalway-desktop:~/Downloads$ python sqlcoder.py
65853087744
Loading checkpoint shards: 100%|██████████████████| 4/4 [00:09<00:00, 2.43s/it]
/home/digitalway/.local/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: do_sample is set to False. However, temperature is set to 0 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/digitalway/.local/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: do_sample is set to False. However, temperature is set to 0 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.
warnings.warn(
/home/digitalway/.local/lib/python3.8/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: do_sample is set to False. However, temperature is set to 0.0 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.
warnings.warn(
Traceback (most recent call last):
File “sqlcoder.py”, line 104, in
generated_sql = generate_query(question)
File “sqlcoder.py”, line 83, in generate_query
generated_ids = model.generate(
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py”, line 115, in decorate_context
return func(*args, **kwargs)
File “/home/digitalway/.local/lib/python3.8/site-packages/transformers/generation/utils.py”, line 1977, in generate
synced_gpus = (is_deepspeed_zero3_enabled() or is_fsdp_managed_module(self)) and dist.get_world_size() > 1
File “/home/digitalway/.local/lib/python3.8/site-packages/transformers/integrations/fsdp.py”, line 29, in is_fsdp_managed_module
import torch.distributed.fsdp
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/fsdp/init.py”, line 1, in
from .flat_param import FlatParameter
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py”, line 30, in
from torch.distributed._tensor import DTensor
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/_tensor/init.py”, line 6, in
import torch.distributed._tensor.ops
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/_tensor/ops/init.py”, line 2, in
from .embedding_ops import * # noqa: F403
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/_tensor/ops/embedding_ops.py”, line 6, in
from torch.distributed._tensor.api import _Partial, DTensorSpec, Replicate, Shard
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/_tensor/api.py”, line 8, in
import torch.distributed._tensor.dispatch as op_dispatch
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/_tensor/dispatch.py”, line 10, in
from torch.distributed._tensor.device_mesh import DeviceMesh
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/_tensor/device_mesh.py”, line 6, in
import torch.distributed._functional_collectives as funcol
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/_functional_collectives.py”, line 7, in
import torch.distributed.distributed_c10d as c10d
File “/home/digitalway/.local/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py”, line 17, in
from torch._C._distributed_c10d import (
ModuleNotFoundError: No module named ‘torch._C._distributed_c10d’; ‘torch._C’ is not a package
digitalway@digitalway-desktop:~/Downloads$