Ollama run Gives: Error-GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !"CUDA error"

I am using Nvidia AGX Orion with 6.0DP.Nvidia driver is displaying NA when using nvidia-smi. But after installing below version of torch :-https://developer.download.nvidia.com/compute/redist/jp/v60dp/pytorch/torch-2.3.0a0+6ddf5cf85e.nv24.04.14026654-cp310-cp310-linux_aarch64.whl
torch.cuda.is_available() became true thus tried to install Ollama and pulled small model’s like llama3 and also tried phi3 which are small models but getting Error: timed out waiting for llama runner to start: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:1848
cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !“CUDA error”
My Ollama Version is:- 0.1.33
sudo journalctl -u ollama.service.txt (14.3 KB)


We have an Ollama container that is built on JetPack 6 DP.
Is this an option for you?

If you prefer to install it locally, please check the below discussion for enabling Ollam on Jetson:


@AastaLLL Thanks it’s working now.