Problem trying to install MXNet and GluonCV on Jetson Nano

Hi there!
I just got a Jetson Nano, and flashed it by using the jetson-nano-jp451-sd-card-image file and Etcher.
I need to install MxNet and GluonCV, so I did the following:

I tried installing MXNet by using the compiled package, as described here

Installed dependencies

 sudo apt-get install -y git build-essential libatlas-base-dev libopencv-dev graphviz python3-pip

and added:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}

to my .bashrc file (before that, I couldn’t use nvcc --version)

I saw that most of the instructions suggest installing stuff by using ‘sudo pip install…’, but I prefer to keep stuff in virtual environments, so I adapted the instructions accordingly.
I created a venv with

python3 -m venv gluon
source envs/gluon/bin/activate

I downloaded the wheel file: mxnet-1.6.0-py3-none-any.whl and ran

pip3 install cython
pip3 install mxnet-1.6.0-py3-none-any.whl

got the output:

Failed to build numpy
Installing collected packages: urllib3, chardet, certifi, idna, requests, numpy, mxnet
  Running install for numpy ... done

I ran pip list, and I got:

certifi (2021.5.30)
chardet (4.0.0)
Cython (0.29.23)
graphviz (0.8.4)
idna (2.10)
mxnet (1.6.0)
numpy (1.19.5)
pip (9.0.1)
pkg-resources (0.0.0)
requests (2.25.1)
setuptools (39.0.1)
urllib3 (1.26.5)

So, mxnet is installed, but it seems it’s not version supporting the GPU (it should be something like mxnet-cu102)
When trying to import mxnet, moreover, I got the error:

	Traceback (most recent call last):
	  File "<stdin>", line 1, in <module>
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 24, in <module>
	    from .context import Context, current_context, cpu, gpu, cpu_pinned
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 24, in <module>
	    from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 214, in <module>
	    _LIB = _load_lib()
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 204, in _load_lib
	    lib_path = libinfo.find_lib_path()
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 74, in find_lib_path
	    'List of candidates:\n' + str('\n'.join(dll_path)))
	RuntimeError: Cannot find the MXNet library.
	List of candidates:

Turned out that the file was created, but in this other folder:


I added

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/lews/envs/gluon/mxnet

to my to my .bashrc file, and now whenever I open a terminal session it shows the error:

bash: ::/home/lews/envs/gluon/mxnet: No such file or directory

However, in spite of the message it seems to be finding the file, because now I’m getting another error when importing mxnet:

	Traceback (most recent call last):
	  File "<stdin>", line 1, in <module>
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 24, in <module>
	    from .context import Context, current_context, cpu, gpu, cpu_pinned
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 24, in <module>
	    from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 214, in <module>
	    _LIB = _load_lib()
	  File "/home/lews/envs/gluon/lib/python3.6/site-packages/mxnet/", line 205, in _load_lib
	    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
	  File "/usr/lib/python3.6/ctypes/", line 348, in __init__
	    self._handle = _dlopen(self._name, mode)
	OSError: cannot open shared object file: No such file or directory

It seems to be looking for CUDA 10.0, while I have 10.2:

	nvcc: NVIDIA (R) Cuda compiler driver
	Copyright (c) 2005-2019 NVIDIA Corporation
	Built on Wed_Oct_23_21:14:42_PDT_2019
	Cuda compilation tools, release 10.2, V10.2.89

PS: I tried all the steps described in the post even WITHOUT using a virtual environment, and I’m having the same issues.


This time, I tried building from source, following this procedure.
I didn’t flash the SD card clean yet (I’ll try to do that and to start Attempt #2 from scratch), but I’m having two issues with this method as well:

  1. When building directly on the Nano, it’s painfully slow. I saw you can cross-compile it on a PC and…somehow transfer the compiled library to the Nano? But I didn’t find any tutorial for that.
  2. I’m getting the following error:
In file included from src/io/
src/io/./image_augmenter.h:31:10: fatal error: opencv2/opencv.hpp: No such file or directory
 #include <opencv2/opencv.hpp>
compilation terminated.
Makefile:461: recipe for target 'build/src/io/image_aug_default.o' failed
make: *** [build/src/io/image_aug_default.o] Error 1
make: *** Waiting for unfinished jobs....


The prebuilt package is for JetPack 4.3.
Since your environment is JetPack4.5.1, please build it from source.

To build MXNet, please refer to below comment:

And it’s recommended to the job on a clear environment.


I tried following the procedure you linked, but I got the error

CMakeFiles/Makefile2:889: recipe for target 'CMakeFiles/mxnet_static.dir/all' failed
make[1]: *** [CMakeFiles/mxnet_static.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
+ pushd .
~ ~
+ PYTHON_DIR=/home/lews/mxnet/python
+ BUILD_DIR=/home/lews/mxnet/build
+ cd /home/lews/mxnet/python
+ python3 bdist_wheel
Traceback (most recent call last):
  File "", line 47, in <module>
    LIB_PATH = libinfo['find_lib_path']()
  File "mxnet/", line 73, in find_lib_path
    'List of candidates:\n' + str('\n'.join(dll_path)))
RuntimeError: Cannot find the MXNet library.
List of candidates:

I tried looking for the ‘’ file, but it seems it doesn’t exist

I noticed now there might be a typo at:

    -DCMAKE_CXX_FLAGS=-I/usr/local/cuda/targets/aarch64-linux/inlude  \

I assume it should be ‘include’?

It seems the mxnet installation was successful after fixing the typo I mentioned above.
However, I still have 2 issues:

  1. I tried importing mxnet and running a few simple tests, and the memory consumption when creating arrays using the gpu context seems too high. Specifically, I tried something very simple, like

    test_cpu = mx.nd.ones((5,5))
    test_gpu = mx.nd.ones((5,5), ctx=mx.gpu(0))

the creation of the first array is immediate, while the execution of the second instruction takes a few minutes, and basically saturates the device RAM.

  1. I wanted to install GluonCV, so I tried building it manually by:
    git clone GitHub - dmlc/gluon-cv: Gluon CV Toolkit
    cd gluon-cv && python install — user

but I got the following error:
RuntimeError: Python version >= 3.7 required

I’ll try reinstalling a later Python version (I was assuming it should be taken care of by the autobuild script), but I don’t know how to fix the first issue


Since Nano’s GPU is limited, the slowness may cause by the shortage of memory.
Could you try it on a flash reboot environment to see if any difference?


Are there any instructions on how to cross-compile it? I’d be happy to flash the card and try as many times as necessary, but on the Nano the compilation using the takes something between 6-8 hours :(


Even better, this is the script I’m using to run object detection with GluonCV:

import time
import numpy as np
import cv2
import gluoncv as gcv
import mxnet as mx

def main():

	ctx = mx.gpu(0)

	## load a pretrained model
	net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_coco', pretrained=True, ctx=ctx)

	## open video file
	cap = cv2.VideoCapture("test_video_files/vlc_test.avi")
	count_frame = 0

		print(f"Frame: {count_frame}")
		total_t_frame = 0

		## load frame from the camera
		ret, frame_np_orig =
		if not ret:
		frame_np_orig = cv2.resize(frame_np_orig,(683, 512))
		key = cv2.waitKey(1)
		if (key == ord('q')):

		# Image pre-processing
		frame_nd_orig = mx.nd.array(cv2.cvtColor(frame_np_orig, cv2.COLOR_BGR2RGB)).astype('uint8')
		frame_nd_new, frame_np_new =, short=512, max_size=700)

		## measure inference time per frame
		start_t = time.time()
		frame_nd_new = frame_nd_new.as_in_context(ctx)
		class_IDs, scores, bboxes = net(frame_nd_new)
		if isinstance(class_IDs, mx.ndarray.ndarray.NDArray):
		if isinstance(scores, mx.ndarray.ndarray.NDArray):
		if isinstance(bboxes, mx.ndarray.ndarray.NDArray):

		stop_t = time.time()
		total_t_frame += (stop_t - start_t)
		FPS = 1/(stop_t-start_t)
		print(f"\tinference time = {(stop_t-start_t)} -> FPS = {1/(stop_t-start_t)}")

		## display the result with cv
		frame_np_new = gcv.utils.viz.cv_plot_bbox(frame_np_new, bboxes[0], scores[0], class_IDs[0], thresh=0.5, class_names=net.classes)
		count_frame += 1



I’m getting an average FPS=2, using SSD Mobilenet V1 (512x512).
Could you (or anyone who has MXNet/GluonCV installed on NANO) by any chance run this script, and tell me which FPS it is reasonable to expect?
I tried running the benchmarks as explained here, but I was getting an ‘Illegal instruction’ error (even after using export OPENBLAS_CORETYPE=ARMV8 ) .
Thanks a lot!


Have you tried to maximize the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

We are going to test your script internally.
Will share the data in our environment with you later.

However, since OpenCV uses CPU for image IO.
It’s expected to have a slower pipeline.


I seem to understand I’m approaching this whole thing the wrong way.
So far, I’ve been simply trying to install the deep learning libraries I generally use (MXNet, GluonCV, Tensorflow) on the Jetson Nano, and to run the code I have as it is (for example, like you pointed out, by using OpenCV for IO) .
However, I just tried running an optimized implementation of Mobilenet V2 following HELLO AI WORLD tutorial by Dustin Franklin, and I got a FPS of around 22.

So, I’m wondering now…is there any way that I can apply the same optimization to my code, or it would be better to re-implement everything with more device-friendly libraries?


jetson-inference uses TensorRT as the inference engine which has optimized for the Jetson platform.
For Jetson, we always recommend converting the model into TensorRT for saving resources and better performance.

If you are using an MXNet model, you can first try to export the model into ONNX format.
And run it with TensorRT with the following command:

$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model]