Cupy or pycuda on Jetson Xavier NX

I am unable to install cupy or pycuda on Jetson Xavier NX. I would like to be able to do cuda based fft in python and numpy convolve. Any suggestions would be much appreciated.

I got pycuda and cupa to install with the following:

  1. pip3 install --global-option=build_ext --global-option="-I/usr/local/cuda/include" --global-option="-L/usr/local/cuda/lib64" pycuda
  2. pip3 install cupy

However now whenever I make a cupy array in python 3.6.9 it hangs indefinitely. For example:
a = cp.random.random(100).astype(cp.complex64)

just hangs forever it won’t execute .


Thanks for reporting this.

Confirmed that we can reproduce the same issue in our environment.
We will share more information with you after more investigation.


I think there is something in cupy that is not optimized for transferring data to GPU on the Jetson NX. In a script I finally got it to go thru but it was painfully slow.


The root cause is that cupy source doesn’t include Xavier GPU architecture(sm_72).
So the library needs to re-generate it with the correct architecture at runtime.

We try to install it from the source with sm_72 config and don’t see this issue anymore.
Below are the steps for your reference:

$ git clone -b v9.6.0 --recursive
$ cd cupy/
# appy following changes
$ pip3 install .

1. Under {cupy_root}/cupy/_core/include/cupy/cub/tune

diff --git a/ b/
index 82893ab9..e3bc8027 100644
--- a/
+++ b/
@@ -40,6 +40,11 @@ else
     SM_ARCH = 200

+ifeq (720, $(findstring 720, $(SM_ARCH)))
+    SM_TARGETS  += -gencode=arch=compute_72,code=\"sm_72,compute_72\"
+    SM_DEF              += -DSM720
+    TEST_ARCH   = 720
 ifeq (700, $(findstring 700, $(SM_ARCH)))
     SM_TARGETS         += -gencode=arch=compute_70,code=\"sm_70,compute_70\"
     SM_DEF             += -DSM700
diff --git a/tune/Makefile b/tune/Makefile
index 926b340f..524139cd 100644
--- a/tune/Makefile
+++ b/tune/Makefile
@@ -70,6 +70,10 @@ else

 # Only one arch per tuning binary
+ifeq (720, $(findstring 720, $(SM_ARCH)))
+    SM_TARGETS = -arch=sm_72
+    SM_ARCH = 720
 ifeq (350, $(findstring 350, $(SM_ARCH)))
     SM_TARGETS = -arch=sm_35
     SM_ARCH = 350

2. Under{cupy_root}/

diff --git a/ b/
index b363c1026..660a61df2 100644
--- a/
+++ b/
@@ -1038,8 +1038,9 @@ def _nvcc_gencode_options(cuda_version):
                          ('compute_60', 'sm_60'),
                          ('compute_61', 'sm_61'),
                          ('compute_70', 'sm_70'),
+                         ('compute_72', 'sm_72'),
                          ('compute_75', 'sm_75'),
-                         'compute_70']
+                         'compute_72']
         elif cuda_version >= 9020:
             arch_list = ['compute_30',


Awesome. Thank you. I will give this a try.

Do I need to uninstall the previous version? I installed with pip3 install cupy. Can I uninstall with pip3 uninstall cupy?


Yes. please uninstall it with pip3 first.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.