Speed up inference time on Nano with mxnet

Hi all! Currently I’m developing with the Jetson Nano, and I’m looking for advice in regards to improving inference performance. I’m using Sagemaker as my training environment for SSD object detection with Resnet-50 as my base network, which exports .params and .json files for mxnet. I’ve built mxnet on the nano using the autoinstaller from this forum, and I’ve been able to infer via usb webcam by more or less following this guide:

That said, my inference speed is really slow, i.e. it takes around 4-5 seconds per frame with an input size of 512x512. I’ve already tried converting my weights to a different architecture via onnx and mmdnn, but my custom model had operators that were not supported by either format so it looks like I’m stuck with mxnet. The mxnet website says that it has tensorrt integration with mxnet but I can’t find any good examples of that anywhere online. The one on the mxnet website is at best confusing and doesn’t help me in my particular use case

One thing that seems to be holding me back is that I’m only able to infer using the cpu. According to the mxnet website, to use the gpu all you have to do is change ctx=cpu() to ctx=gpu(), and make sure that your data is converted to float32 before inputting it (https://github.com/apache/incubator-mxnet/issues/13332). However, when I do that, it still crashes my Jetson because it seems to run out of memory. Does this have anything to do with the custom build of mxnet for the Nano? Otherwise why would it do that?

Any suggestions are welcome and appreciated!

Here’s my code:

#import
import mxnet as mx
import numpy as np
import cv2, os, urllib, argparse, time
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])


#array of object labels for custom network
object_categories = ['object 1','object 2']

#load model

""" important: make sure that -symbol.json and -0000.params are in the format network-prefix-symbol.json and network-prefix-0000.params and are located in current directory """

class ImagenetModel(object):
	
	def __init__(self, synset_path, network_prefix, params_url=None, symbol_url=None, synset_url=None, context=mx.cpu(), label_names=['prob_label'], input_shapes=[('data', (1,3,10,10))]):
		# Load the network parameters from default epoch 0
		sym, arg_params, aux_params = mx.model.load_checkpoint(network_prefix, 0)
		# Load the network into an MXNet module and bind the corresponding parameters
		self.mod = mx.mod.Module(symbol=sym, label_names=label_names, context=context)
		self.mod.bind(for_training=False, data_shapes= input_shapes)
		self.mod.set_params(arg_params, aux_params)
		self.camera = None

	def predict_from_cam(self, reshape=(512, 512), N=50):
		
		topN = []
		
		# Switch RGB to BGR format (which ImageNet networks take)
		img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
		if img is None:
			return topN

		# Resize image to fit network input
		img = cv2.resize(img, reshape)
		img = np.swapaxes(img, 0, 2)
		img = np.swapaxes(img, 1, 2)
		img = img[np.newaxis, :]

		# Run forward on the image
		self.mod.forward(Batch([mx.nd.array(img)]))
		prob = self.mod.get_outputs()[0].asnumpy()
		prob = np.squeeze(prob)
		global results
		results = [prob[i].tolist() for i in range(100)]

if __name__ == "__main__":
	parser = argparse.ArgumentParser(description="pull and load pre-trained resnet model to classify one image")
	parser.add_argument('--img', type=str, default='cam', help='input image for classification, if this is cam it captures from the webcam')
	parser.add_argument('--prefix', type=str, default='model_algo_1', help='the prefix of the pre-trained model')
	parser.add_argument('--label-name', type=str, default='softmax_label', help='the name of the last layer in the loaded network (usually softmax_label)')
	parser.add_argument('--synset', type=str, default='synset.txt', help='the path of the synset for the model')
	args = parser.parse_args()
	mod = ImagenetModel(args.synset, args.prefix, label_names=[args.label_name])
	print ("predicting on "+args.img)
	if args.img == "cam":
		vid = cv2.VideoCapture(0)
		while(True):
			ret, frame = vid.read()
			mod.predict_from_cam()			
			print(results)			
			cv2.imshow('frame', frame)
			#wait x ms, search for escape keypress on cv2 frame
			if cv2.waitKey (1000) & 0xFF == ord('q'):
				break
		vid.release()
		cv2.destroyAllWindows()

Hi,

Here is sample for resnet with MXNet-TRT on Nano.
Would you mind to give it a try first?

Thanks.

Hi! Thanks for the response! That script also crashes my Jetson, and once again I get the low memory warning. It downloaded the .params and ,json files, but it never printed ‘Warming up Mxnet’.

BTW, I know it’s not just my install of mxnet- I had to unplug the power cord because I let the script go too long, which corrupted ubuntu. Even with a fresh flash of jetpack and a fresh install of mxnet, it still crashes my nano.

Hi,

May I know how do you install the MXNet package first?

Please noticed that there is a hard-coded TensorRT workspace value, which is too large for the Nano.

We recommend to lower the value into 32Mib if you build it from the source.
Another alternative is to install the prebuilt from us directly:
https://drive.google.com/drive/u/1/folders/1dzAFVipH3qQWoyNGtlm_P4jKYN0bHSD3

We have tested the prebuilt and the MXNet-TRT script with JetPack4.4 GA.
The resnet-18 model can be inferenced without issue on the Jetson Nano.

$ python3 resnet18-mxnet-trt.py
Warming up MXNet
Starting MXNet timed run
8.189167659000002
Building TensorRT engine
Warming up TensorRT
Starting TensorRT timed run
3.3740783479999976

Thanks.

Hi,

Thanks for the reply. I did build it directly from the autoinstaller for the nano on this forum, but I’ll try installing that whl file. What’s the difference between the three files in the folder you sent?

Aasta,

I tried installing with all of the files you attached, but each one gave me the error that the wheel file was not a supported wheel on this platform.

Hi,

I went back to the forums and rebuilt mxnet using these instructions: I was unable to compile and install Mxnet1.5 with tensorrt on the jetson nano,Is there someone have compile it, please help me. Thank you.

The end of the install process freezes up my nano, when testing tensorrt with mxnet, much like before. It said the process was killed in the install window. And, the python test script still does not work. Is this an issue with my nano only? Do I have a defective item? If so how do I get that replaced?

Hi,

Is reflash an option for you?
If yes, we recommend to reflash the Nano for a clean environment, and run the auto installer again:
https://github.com/AastaNV/JEP/blob/master/MXNET/autoinstall_mxnet.sh

Sometime the installer may not work at the first time due to apt-get update issue.
You may need to try it twice.

Thanks.

Yes, I already tried that and it has no effect. Mxnet installs properly and I’m able to infer via cpu, it just crashes once I try to infer via gpu