I am using the NVIDIA Jetson AGX Orin Developer Kit (64 GB) to run LLM Inference on it. I am using ollama to run llama3.2:3b inference, the problem with that is the inference is so slow. I am trying to check the process using nvidia-smi but it returns no running process is founded.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.4.0 Driver Version: 540.4.0 CUDA Version: 12.6 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Orin (nvgpu) N/A | N/A N/A | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | N/A N/A |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
I am using this python code to check if the cuda is running or not.
import torch
print(torch.randn(1).cuda())
print(torch.rand(5, 3, device=torch.device("cuda")))
and it returns an output
/mnt/storage/miniconda3/envs/usr/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /opt/pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
tensor([-0.2943], device='cuda:0')
tensor([[0.8285, 0.6868, 0.8449],
[0.6386, 0.1849, 0.4723],
[0.2465, 0.7683, 0.2403],
[0.5859, 0.7134, 0.0960],
[0.8186, 0.0581, 0.5210]], device='cuda:0')
and this is the output of nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:14:07_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0
I am using in this machine Linux-aarch64 with Ubuntu 22.04 Jammy Jellyfish distrubution