Cupy or pycuda on Jetson Xavier NX

I am unable to install cupy or pycuda on Jetson Xavier NX. I would like to be able to do cuda based fft in python and numpy convolve. Any suggestions would be much appreciated.

I got pycuda and cupa to install with the following:

  1. pip3 install --global-option=build_ext --global-option="-I/usr/local/cuda/include" --global-option="-L/usr/local/cuda/lib64" pycuda
  2. pip3 install cupy

However now whenever I make a cupy array in python 3.6.9 it hangs indefinitely. For example:
a = cp.random.random(100).astype(cp.complex64)

just hangs forever it won’t execute .

Hi,

Thanks for reporting this.

Confirmed that we can reproduce the same issue in our environment.
We will share more information with you after more investigation.

Thanks.

I think there is something in cupy that is not optimized for transferring data to GPU on the Jetson NX. In a script I finally got it to go thru but it was painfully slow.

Hi,

The root cause is that cupy source doesn’t include Xavier GPU architecture(sm_72).
So the library needs to re-generate it with the correct architecture at runtime.

We try to install it from the source with sm_72 config and don’t see this issue anymore.
Below are the steps for your reference:

$ git clone -b v9.6.0 --recursive https://github.com/cupy/cupy.git
$ cd cupy/
# appy following changes
$ pip3 install .
Changes:

1. Under {cupy_root}/cupy/_core/include/cupy/cub/tune

diff --git a/common.mk b/common.mk
index 82893ab9..e3bc8027 100644
--- a/common.mk
+++ b/common.mk
@@ -40,6 +40,11 @@ else
     SM_ARCH = 200
 endif

+ifeq (720, $(findstring 720, $(SM_ARCH)))
+    SM_TARGETS  += -gencode=arch=compute_72,code=\"sm_72,compute_72\"
+    SM_DEF              += -DSM720
+    TEST_ARCH   = 720
+endif
 ifeq (700, $(findstring 700, $(SM_ARCH)))
     SM_TARGETS         += -gencode=arch=compute_70,code=\"sm_70,compute_70\"
     SM_DEF             += -DSM700
diff --git a/tune/Makefile b/tune/Makefile
index 926b340f..524139cd 100644
--- a/tune/Makefile
+++ b/tune/Makefile
@@ -70,6 +70,10 @@ else
 endif

 # Only one arch per tuning binary
+ifeq (720, $(findstring 720, $(SM_ARCH)))
+    SM_TARGETS = -arch=sm_72
+    SM_ARCH = 720
+endif
 ifeq (350, $(findstring 350, $(SM_ARCH)))
     SM_TARGETS = -arch=sm_35
     SM_ARCH = 350

2. Under{cupy_root}/

diff --git a/cupy_setup_build.py b/cupy_setup_build.py
index b363c1026..660a61df2 100644
--- a/cupy_setup_build.py
+++ b/cupy_setup_build.py
@@ -1038,8 +1038,9 @@ def _nvcc_gencode_options(cuda_version):
                          ('compute_60', 'sm_60'),
                          ('compute_61', 'sm_61'),
                          ('compute_70', 'sm_70'),
+                         ('compute_72', 'sm_72'),
                          ('compute_75', 'sm_75'),
-                         'compute_70']
+                         'compute_72']
         elif cuda_version >= 9020:
             arch_list = ['compute_30',
                          'compute_50',

Thanks.

Awesome. Thank you. I will give this a try.

Do I need to uninstall the previous version? I installed with pip3 install cupy. Can I uninstall with pip3 uninstall cupy?

Hi,

Yes. please uninstall it with pip3 first.
Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.