Jetson AGX Xavier | l4t-ml:r36.2.0-py3 | Pytorch finds wrong Cuda version (7.2 instead of 12.2)

I’m on a Jetson AGX Xavier with JetPack 5.1.2 using the container l4t-ml:r36.2.0-py3 with this command:

docker run -it --rm --runtime nvidia --network host -v /home/jetson/Services/jupyter:/root/.jupyter -v /home/jetson//Apps/:/home/Apps/ -v /home/jetson/Datas/:/home/Datas/ nvcr.io/nvidia/l4t-ml:r36.2.0-py3

I tried to load and use with Pytorch the model mistral7B with the notebook Mistral 7B Instruc

“”"
!pip3 install transformers
“”"

“”"
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
“”"

“”"
torch.set_default_device(‘cuda’)
“”"

“”"
model = AutoModelForCausalLM.from_pretrained(“mistralai/Mistral-7B-Instruct-v0.1”,
torch_dtype=“auto”)
“”"
and this model= … cell gave me this error:

/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:190: UserWarning: 
    Found GPU0 Xavier which is of cuda capability 7.2.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is 8.7.
    
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:215: UserWarning: 
Xavier with CUDA capability sm_72 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_87.
If you want to use the Xavier GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[4], line 1
----> 1 model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1",
      2                                              torch_dtype="auto")

File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:566, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    564 elif type(config) in cls._model_mapping.keys():
    565     model_class = _get_model_class(config, cls._model_mapping)
--> 566     return model_class.from_pretrained(
    567         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    568     )
    569 raise ValueError(
    570     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    571     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    572 )

File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3594, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3588 config = cls._autoset_attn_implementation(
   3589     config, use_flash_attention_2=use_flash_attention_2, torch_dtype=torch_dtype, device_map=device_map
   3590 )
   3592 with ContextManagers(init_contexts):
   3593     # Let's make sure we don't run the init function of buffer modules
-> 3594     model = cls(config, *model_args, **model_kwargs)
   3596 # make sure we use the model's config since the __init__ call might have copied it
   3597 config = model.config

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:1081, in MistralForCausalLM.__init__(self, config)
   1079 def __init__(self, config):
   1080     super().__init__(config)
-> 1081     self.model = MistralModel(config)
   1082     self.vocab_size = config.vocab_size
   1083     self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:913, in MistralModel.__init__(self, config)
    909 self.vocab_size = config.vocab_size
    911 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
    912 self.layers = nn.ModuleList(
--> 913     [MistralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
    914 )
    915 self._attn_implementation = config._attn_implementation
    916 self.norm = MistralRMSNorm(config.hidden_size, eps=config.rms_norm_eps)

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:913, in <listcomp>(.0)
    909 self.vocab_size = config.vocab_size
    911 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
    912 self.layers = nn.ModuleList(
--> 913     [MistralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
    914 )
    915 self._attn_implementation = config._attn_implementation
    916 self.norm = MistralRMSNorm(config.hidden_size, eps=config.rms_norm_eps)

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:715, in MistralDecoderLayer.__init__(self, config, layer_idx)
    712 super().__init__()
    713 self.hidden_size = config.hidden_size
--> 715 self.self_attn = MISTRAL_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
    717 self.mlp = MistralMLP(config)
    718 self.input_layernorm = MistralRMSNorm(config.hidden_size, eps=config.rms_norm_eps)

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:230, in MistralAttention.__init__(self, config, layer_idx)
    227 self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False)
    228 self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--> 230 self.rotary_emb = MistralRotaryEmbedding(
    231     self.head_dim,
    232     max_position_embeddings=self.max_position_embeddings,
    233     base=self.rope_theta,
    234 )

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:99, in MistralRotaryEmbedding.__init__(self, dim, max_position_embeddings, base, device)
     97 self.max_position_embeddings = max_position_embeddings
     98 self.base = base
---> 99 inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
    100 self.register_buffer("inv_freq", inv_freq, persistent=False)
    102 # Build here to make `torch.jit.trace` work.

File /usr/local/lib/python3.10/dist-packages/torch/utils/_device.py:77, in DeviceContext.__torch_function__(self, func, types, args, kwargs)
     75 if func in _device_constructors() and kwargs.get('device') is None:
     76     kwargs['device'] = self.device
---> 77 return func(*args, **kwargs)

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Since the error is about Cuda7.2 i check cuda’s version within the container which seems good:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:08:11_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

notebook + error.zip (263.7 KB)

any idea on how to solve this please ?

I zipped the notebook and added it as an attached file.

Hi @Kptn_Kinday, you are trying to run a container built for JetPack 6.0 (L4T R36.2.0) on JetPack 5.1.2 (L4T R35.4.1). And Xavier isn’t supported on JetPack 6, so PyTorch wasn’t built with sm_72 (hence that error you got). Instead please use dustynv/l4t-ml:r35.4.1 container which was built for JetPack 5.1.2.

Thanks it worked ! Is there a tag or anything else to check container’s compatibility with jetpack versions ?

I used dustynv/l4t-ml:r35.4.1 but now I can’t use sklearn

import sklearn

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/sklearn/__check_build/__init__.py:45
     44 try:
---> 45     from ._check_build import check_build  # noqa
     46 except ImportError as e:

ImportError: /usr/local/lib/python3.8/dist-packages/sklearn/__check_build/../../scikit_learn.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Cell In[19], line 2
      1 import torch
----> 2 import sklearn 
      3 #from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline

File /usr/local/lib/python3.8/dist-packages/sklearn/__init__.py:79
     68     sys.stderr.write("Partial import of sklearn during the build process.\n")
     69     # We are not importing the rest of scikit-learn during the build
     70     # process, as it may not be compiled yet
     71 else:
   (...)
     77     # later is linked to the OpenMP runtime to make it possible to introspect
     78     # it and importing it first would fail if the OpenMP dll cannot be found.
---> 79     from . import (
     80         __check_build,  # noqa: F401
     81         _distributor_init,  # noqa: F401
     82     )
     83     from .base import clone
     84     from .utils._show_versions import show_versions

File /usr/local/lib/python3.8/dist-packages/sklearn/__check_build/__init__.py:47
     45     from ._check_build import check_build  # noqa
     46 except ImportError as e:
---> 47     raise_build_error(e)

File /usr/local/lib/python3.8/dist-packages/sklearn/__check_build/__init__.py:31, in raise_build_error(e)
     29         else:
     30             dir_content.append(filename + "\n")
---> 31     raise ImportError("""%s
     32 ___________________________________________________________________________
     33 Contents of %s:
     34 %s
     35 ___________________________________________________________________________
     36 It seems that scikit-learn has not been built correctly.
     37 
     38 If you have installed scikit-learn from source, please do not forget
     39 to build the package before using it: run `python setup.py install` or
     40 `make` in the source directory.
     41 %s""" % (e, local_dir, "".join(dir_content).strip(), msg))

ImportError: /usr/local/lib/python3.8/dist-packages/sklearn/__check_build/../../scikit_learn.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block
___________________________________________________________________________
Contents of /usr/local/lib/python3.8/dist-packages/sklearn/__check_build:
_check_build.cpython-38-aarch64-linux-gnu.so__pycache__               __init__.py
___________________________________________________________________________
It seems that scikit-learn has not been built correctly.

If you have installed scikit-learn from source, please do not forget
to build the package before using it: run `python setup.py install` or
`make` in the source directory.

If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.

Following : "It seems that scikit-learn has not been built correctly." Nvidia Jetson · Issue #28362 · scikit-learn/scikit-learn · GitHub i have no error when i import sklearn using the python shell. I can’t define wath is causing this error.

In general, you should try to stick to container tags with the same L4T version as your own or with the same major version (however some of the JetPack 5 ones with different minor versions are inter-compatible). However containers built for JetPack 4 aren’t compatible with JetPack 5, and JetPack 5 containers aren’t compatible with JetPack 6, ect. Over time we have migrated more of the GPU components like CUDA/cuDNN/ect to being installed inside the container (as opposed to being mounted from the device), which increases the size of the container images but also makes them more portable across minor versions of JetPack.

Thanks for reporting this @Kptn_Kinday - can you try starting the container like this for now:

docker run -it --rm --runtime nvidia --network host \
  --env LD_PRELOAD=/usr/local/lib/python3.8/dist-packages/sklearn/__check_build/../../scikit_learn.libs/libgomp-d22c30c5.so.1.0.0 \
  dustynv/l4t-ml:r35.4.1 

–env LD_PRELOAD=/usr/local/lib/python3.8/dist-packages/sklearn/__check_build/…/…/scikit_learn.libs/libgomp-d22c30c5.so.1.0.0
dustynv/l4t-ml:r35.4.1

solved the import issue but now I have trouble with LD_LIBRARY_PATH causing Torch to crash when using it because CUDA Setup failed despite GPU being available . The piece of code and the error with the zipped notebook (testing of DeciLM-7B instruct) :

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
model_id = 'Deci/DeciLM-7B-instruct'

using_colab_T4_GPU = True
if using_colab_T4_GPU:
  bnb_config = BitsAndBytesConfig(
      load_in_4bit = True,
      bnb_4bit_compute_dtype=torch.bfloat16
  )
  dtype_kwargs = {"quantization_config": bnb_config}
else:
  dtype_kwargs = {"torch_dtype": torch.bfloat16}


model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    **dtype_kwargs
)

error :

config.json: 100%
894/894 [00:00<00:00, 18.8kB/s]
configuration_decilm.py: 100%
576/576 [00:00<00:00, 14.0kB/s]
version_check.py: 100%
383/383 [00:00<00:00, 13.6kB/s]
A new version of the following files was downloaded from https://huggingface.co/Deci/DeciLM-7B-instruct:
- version_check.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
(…)sformers_v4_35_2__configuration_llama.py: 100%
9.20k/9.20k [00:00<00:00, 296kB/s]
A new version of the following files was downloaded from https://huggingface.co/Deci/DeciLM-7B-instruct:
- transformers_v4_35_2__configuration_llama.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Deci/DeciLM-7B-instruct:
- configuration_decilm.py
- version_check.py
- transformers_v4_35_2__configuration_llama.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_decilm.py: 100%
14.5k/14.5k [00:00<00:00, 470kB/s]
transformers_v4_35_2__modeling_llama.py: 100%
56.4k/56.4k [00:00<00:00, 707kB/s]
(…)ers_v4_35_2__modeling_attn_mask_utils.py: 100%
10.1k/10.1k [00:00<00:00, 345kB/s]
A new version of the following files was downloaded from https://huggingface.co/Deci/DeciLM-7B-instruct:
- transformers_v4_35_2__modeling_attn_mask_utils.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Deci/DeciLM-7B-instruct:
- transformers_v4_35_2__modeling_llama.py
- transformers_v4_35_2__modeling_attn_mask_utils.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Deci/DeciLM-7B-instruct:
- modeling_decilm.py
- transformers_v4_35_2__modeling_llama.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
model.safetensors.index.json: 100%
23.9k/23.9k [00:00<00:00, 708kB/s]
Downloading shards: 100%
3/3 [08:19<00:00, 157.06s/it]
model-00001-of-00003.safetensors: 100%
4.99G/4.99G [04:06<00:00, 21.3MB/s]
model-00002-of-00003.safetensors: 100%
4.92G/4.92G [01:38<00:00, 47.1MB/s]
model-00003-of-00003.safetensors: 100%
4.18G/4.18G [02:33<00:00, 14.7MB/s]
False

===================================BUG REPORT===================================
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('/data/models/torch')}
The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=114, Highest Compute Capability: 7.2.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
/usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so: cannot open shared object file: No such file or directory
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=114 make cuda11x_nomatmul
python setup.py install
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


  warn(msg)
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: /usr/local/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  warn(msg)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:1364, in _LazyModule._get_module(self, module_name)
   1363 try:
-> 1364     return importlib.import_module("." + module_name, self.__name__)
   1365 except Exception as e:

File /usr/lib/python3.8/importlib/__init__.py:127, in import_module(name, package)
    126         level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:991, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:975, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:671, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:848, in exec_module(self, module)

File <frozen importlib._bootstrap>:219, in _call_with_frames_removed(f, *args, **kwds)

File /usr/local/lib/python3.8/dist-packages/transformers/integrations/bitsandbytes.py:11
     10 if is_bitsandbytes_available():
---> 11     import bitsandbytes as bnb
     12     import torch

File /usr/local/lib/python3.8/dist-packages/bitsandbytes/__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from . import cuda_setup, utils, research
      7 from .autograd._functions import (
      8     MatmulLtState,
      9     bmm_cublas,
   (...)
     13     matmul_4bit
     14 )

File /usr/local/lib/python3.8/dist-packages/bitsandbytes/research/__init__.py:1
----> 1 from . import nn
      2 from .autograd._functions import (
      3     switchback_bnb,
      4     matmul_fp8_global,
      5     matmul_fp8_mixed,
      6 )

File /usr/local/lib/python3.8/dist-packages/bitsandbytes/research/nn/__init__.py:1
----> 1 from .modules import LinearFP8Mixed, LinearFP8Global

File /usr/local/lib/python3.8/dist-packages/bitsandbytes/research/nn/modules.py:8
      7 import bitsandbytes as bnb
----> 8 from bitsandbytes.optim import GlobalOptimManager
      9 from bitsandbytes.utils import OutlierTracer, find_outlier_dims

File /usr/local/lib/python3.8/dist-packages/bitsandbytes/optim/__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from bitsandbytes.cextension import COMPILED_WITH_CUDA
      8 from .adagrad import Adagrad, Adagrad8bit, Adagrad32bit

File /usr/local/lib/python3.8/dist-packages/bitsandbytes/cextension.py:20
     19     CUDASetup.get_instance().print_log_stack()
---> 20     raise RuntimeError('''
     21     CUDA Setup failed despite GPU being available. Please run the following command to get more information:
     22 
     23     python -m bitsandbytes
     24 
     25     Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
     26     to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
     27     and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues''')
     28 lib.cadam32bit_grad_fp32 # runs on an error if the library could not be found -> COMPILED_WITH_CUDA=False

RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[3], line 14
     10 else:
     11   dtype_kwargs = {"torch_dtype": torch.bfloat16}
---> 14 model = AutoModelForCausalLM.from_pretrained(
     15     model_id,
     16     device_map="auto",
     17     trust_remote_code=True,
     18     **dtype_kwargs
     19 )

File /usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py:561, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    559     else:
    560         cls.register(config.__class__, model_class, exist_ok=True)
--> 561     return model_class.from_pretrained(
    562         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    563     )
    564 elif type(config) in cls._model_mapping.keys():
    565     model_class = _get_model_class(config, cls._model_mapping)

File /usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py:3608, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3605     keep_in_fp32_modules = []
   3607 if load_in_8bit or load_in_4bit:
-> 3608     from .integrations import get_keys_to_not_convert, replace_with_bnb_linear
   3610     llm_int8_skip_modules = quantization_config.llm_int8_skip_modules
   3611     load_in_8bit_fp32_cpu_offload = quantization_config.llm_int8_enable_fp32_cpu_offload

File <frozen importlib._bootstrap>:1039, in _handle_fromlist(module, fromlist, import_, recursive)

File /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:1354, in _LazyModule.__getattr__(self, name)
   1352     value = self._get_module(name)
   1353 elif name in self._class_to_module.keys():
-> 1354     module = self._get_module(self._class_to_module[name])
   1355     value = getattr(module, name)
   1356 else:

File /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:1366, in _LazyModule._get_module(self, module_name)
   1364     return importlib.import_module("." + module_name, self.__name__)
   1365 except Exception as e:
-> 1366     raise RuntimeError(
   1367         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1368         f" traceback):\n{e}"
   1369     ) from e

RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

the notebook (with the error stored) :
DeciLM_7B_Instruct.ipynb.zip (7.5 KB)

Thank you for you help it’s very appreciated.

@Kptn_Kinday bitsandbytes requires special patches to work on Jetson - instead please try using my bitsandbytes container for JetPack 5 that has these fixes applied:

Note that I am not maintaining bitsandbytes past JetPack 5 (so there is no version for JetPack 6), because bitsandbytes is slower than inferencing than the unquantized models and has been surpassed by the quantization methods in the likes of AutoGPTQ, AWQ, exllama, llama.cpp, ect.

Or you could just totally disable bitsandbytes in this notebook and run the model as FP16, which I believe you should have enough memory to do on Jetson AGX Xavier (32GB) as you are attempting to load a 7B parameter model.

Could you please reply to this topic

where I am getting errors when I run “import torch” command.

I am also running Jetpack 5.1.2 on agx xavier Industrial.

Hi @nagesh_accord, sure thing - I moved your topic to this forum and replied to it.

Thanks a lot.

I would ask any further queries in that thread, do kindly reply and help me in installing
pytorch, CUDA opencv etc successfully on my jetson agx Xavier Jetpack 5.1.2.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.