System 1:
- DGX Spark
- Connected through NVIDIA Sync
System 2:
- Windows 11
- NVIDIA RTX 3070
Issue:
On system 1, my code below will throw an error:
python numba-test.py
…/.venv/lib/python3.12/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 4 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
Segmentation fault (core dumped)
On system 2, the code runs fine:
[2. 2. 2. ... 2. 2. 2.]
…
Code:
# %%
from numba import cuda
import numpy as np
@cuda.jit
def vector_add(a, b, c):
idx = cuda.grid(1)
if idx < a.size:
c[idx] = a[idx] + b[idx]
N = 1024
a = np.ones(N, dtype=np.float32)
b = np.ones(N, dtype=np.float32)
c = np.zeros_like(a)
threads_per_block = 256
blocks_per_grid = (N + threads_per_block - 1) // threads_per_block
vector_add[blocks_per_grid, threads_per_block](a, b, c)
cuda.synchronize()
print(c)
Question/Help With:
The only thing I can think of at this hour is that the DGX spark has unified memory, but that really shouldn’t be causing this sort of problem. I’ve also never used numba before. So it could be some strange user error on my end. Was hoping to see if anyone else has had any issues or if this is a known issue.
I was going to try cupy next, but as a last resort I will try a more traditional method and write a CUDA/C++ kernel and pybind11.