I have a RAG application that is working fine and because of the limited space in Jetson Orin Nano Developer Kit, I am using a quantized mistral 7b instruct model.
I would greatly appreciate it if someone can assist me on step by step ways of deploying the RAG application on the Jetson Orin Nano Developer Kit.
I have deployed the model on the edge device. However, I am getting the error below each time I tried to run the RAG application. I would be glad if someone can assist me with this.
RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): CUDA Setup failed despite GPU being available. Please run the following command to get more information: python -m bitsandbytes Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: GitHub · Where software is built
Although, each time I cd into bitsanbytes and I ran the code: python -m bitsandbytes
I got this:
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = [‘8.7’]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Running a quick check that:
+ library is importable
+ CUDA function is callable
WARNING: Please be sure to sanitize sensible info from any such env vars!