Speed up inference time on Nano with mxnet

benjamin72115 · August 10, 2020, 8:02pm

Hi all! Currently I’m developing with the Jetson Nano, and I’m looking for advice in regards to improving inference performance. I’m using Sagemaker as my training environment for SSD object detection with Resnet-50 as my base network, which exports .params and .json files for mxnet. I’ve built mxnet on the nano using the autoinstaller from this forum, and I’ve been able to infer via usb webcam by more or less following this guide:

That said, my inference speed is really slow, i.e. it takes around 4-5 seconds per frame with an input size of 512x512. I’ve already tried converting my weights to a different architecture via onnx and mmdnn, but my custom model had operators that were not supported by either format so it looks like I’m stuck with mxnet. The mxnet website says that it has tensorrt integration with mxnet but I can’t find any good examples of that anywhere online. The one on the mxnet website is at best confusing and doesn’t help me in my particular use case

One thing that seems to be holding me back is that I’m only able to infer using the cpu. According to the mxnet website, to use the gpu all you have to do is change ctx=cpu() to ctx=gpu(), and make sure that your data is converted to float32 before inputting it (MXNet Python inference crash when copy from CPU to GPU · Issue #13332 · apache/incubator-mxnet · GitHub). However, when I do that, it still crashes my Jetson because it seems to run out of memory. Does this have anything to do with the custom build of mxnet for the Nano? Otherwise why would it do that?

Any suggestions are welcome and appreciated!

Here’s my code:

#import
import mxnet as mx
import numpy as np
import cv2, os, urllib, argparse, time
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])


#array of object labels for custom network
object_categories = ['object 1','object 2']

#load model

""" important: make sure that -symbol.json and -0000.params are in the format network-prefix-symbol.json and network-prefix-0000.params and are located in current directory """

class ImagenetModel(object):
	
	def __init__(self, synset_path, network_prefix, params_url=None, symbol_url=None, synset_url=None, context=mx.cpu(), label_names=['prob_label'], input_shapes=[('data', (1,3,10,10))]):
		# Load the network parameters from default epoch 0
		sym, arg_params, aux_params = mx.model.load_checkpoint(network_prefix, 0)
		# Load the network into an MXNet module and bind the corresponding parameters
		self.mod = mx.mod.Module(symbol=sym, label_names=label_names, context=context)
		self.mod.bind(for_training=False, data_shapes= input_shapes)
		self.mod.set_params(arg_params, aux_params)
		self.camera = None

	def predict_from_cam(self, reshape=(512, 512), N=50):
		
		topN = []
		
		# Switch RGB to BGR format (which ImageNet networks take)
		img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
		if img is None:
			return topN

		# Resize image to fit network input
		img = cv2.resize(img, reshape)
		img = np.swapaxes(img, 0, 2)
		img = np.swapaxes(img, 1, 2)
		img = img[np.newaxis, :]

		# Run forward on the image
		self.mod.forward(Batch([mx.nd.array(img)]))
		prob = self.mod.get_outputs()[0].asnumpy()
		prob = np.squeeze(prob)
		global results
		results = [prob[i].tolist() for i in range(100)]

if __name__ == "__main__":
	parser = argparse.ArgumentParser(description="pull and load pre-trained resnet model to classify one image")
	parser.add_argument('--img', type=str, default='cam', help='input image for classification, if this is cam it captures from the webcam')
	parser.add_argument('--prefix', type=str, default='model_algo_1', help='the prefix of the pre-trained model')
	parser.add_argument('--label-name', type=str, default='softmax_label', help='the name of the last layer in the loaded network (usually softmax_label)')
	parser.add_argument('--synset', type=str, default='synset.txt', help='the path of the synset for the model')
	args = parser.parse_args()
	mod = ImagenetModel(args.synset, args.prefix, label_names=[args.label_name])
	print ("predicting on "+args.img)
	if args.img == "cam":
		vid = cv2.VideoCapture(0)
		while(True):
			ret, frame = vid.read()
			mod.predict_from_cam()			
			print(results)			
			cv2.imshow('frame', frame)
			#wait x ms, search for escape keypress on cv2 frame
			if cv2.waitKey (1000) & 0xFF == ord('q'):
				break
		vid.release()
		cv2.destroyAllWindows()

AastaLLL · August 11, 2020, 2:38am

Hi,

Here is sample for resnet with MXNet-TRT on Nano.
Would you mind to give it a try first?

github.com

AastaNV/JEP/blob/master/MXNET/resnet18-mxnet-trt_nano.py

import mxnet as mx
from mxnet.gluon.model_zoo import vision
import time
import os

os.environ['MXNET_USE_TENSORRT'] = '1'
os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'

batch_shape = (1, 3, 224, 224)
resnet18 = vision.resnet18_v2(pretrained=True)
resnet18.hybridize()
resnet18.forward(mx.nd.zeros(batch_shape))
resnet18.export('resnet18_v2')
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet18_v2', 0)

# Create sample input
input = mx.nd.zeros(batch_shape)

# Execute with MXNet
executor = sym.simple_bind(ctx=mx.gpu(0), data=batch_shape, grad_req='null', force_rebind=True)

This file has been truncated. show original

Thanks.

benjamin72115 · August 11, 2020, 2:15pm

Hi! Thanks for the response! That script also crashes my Jetson, and once again I get the low memory warning. It downloaded the .params and ,json files, but it never printed ‘Warming up Mxnet’.

BTW, I know it’s not just my install of mxnet- I had to unplug the power cord because I let the script go too long, which corrupted ubuntu. Even with a fresh flash of jetpack and a fresh install of mxnet, it still crashes my nano.

AastaLLL · August 12, 2020, 2:33am

Hi,

May I know how do you install the MXNet package first?

Please noticed that there is a hard-coded TensorRT workspace value, which is too large for the Nano.

github.com

apache/incubator-mxnet/blob/master/src/operator/subgraph/tensorrt/tensorrt.cc#L314


      
              dtypes[eid] = out_type[i];
              shapes[eid] = out_shape[i];
            }
            graph.attrs["dtype_inputs"] = std::make_shared<nnvm::any>(std::move(dtype_inputs));
            graph.attrs["shape_inputs"] = std::make_shared<nnvm::any>(std::move(shape_inputs));
            graph.attrs["dtype"]        = std::make_shared<nnvm::any>(std::move(dtypes));
            graph.attrs["shape"]        = std::make_shared<nnvm::any>(std::move(shapes));
            auto onnx_graph             = op::nnvm_to_onnx::ConvertNnvmGraphToOnnx(graph, &params_map);
            auto trt_tuple = ::onnx_to_tensorrt::onnxToTrtCtx(onnx_graph, max_batch_size, 1 << 30);
            return OpStatePtr::Create<TRTEngineParam>(std::move(std::get<0>(trt_tuple)),
                                                      std::move(std::get<1>(trt_tuple)),
                                                      std::move(std::get<2>(trt_tuple)),
                                                      inputs_to_idx,
                                                      outputs_to_idx);
          }
          
          
NNVM_REGISTER_OP(_TensorRT)
              .describe(R"code(TRT operation (one engine)
          )code" ADD_FILELINE)
              .set_num_inputs(TRTNumInputs)
              .set_num_outputs(DefaultSubgraphOpNumOutputs)

We recommend to lower the value into 32Mib if you build it from the source.
Another alternative is to install the prebuilt from us directly:
https://drive.google.com/drive/u/1/folders/1dzAFVipH3qQWoyNGtlm_P4jKYN0bHSD3

We have tested the prebuilt and the MXNet-TRT script with JetPack4.4 GA.
The resnet-18 model can be inferenced without issue on the Jetson Nano.

$ python3 resnet18-mxnet-trt.py
Warming up MXNet
Starting MXNet timed run
8.189167659000002
Building TensorRT engine
Warming up TensorRT
Starting TensorRT timed run
3.3740783479999976

Thanks.

benjamin72115 · August 12, 2020, 4:26pm

Hi,

Thanks for the reply. I did build it directly from the autoinstaller for the nano on this forum, but I’ll try installing that whl file. What’s the difference between the three files in the folder you sent?

benjamin72115 · August 12, 2020, 5:43pm

Aasta,

I tried installing with all of the files you attached, but each one gave me the error that the wheel file was not a supported wheel on this platform.

benjamin72115 · August 12, 2020, 6:07pm

Hi,

I went back to the forums and rebuilt mxnet using these instructions: I was unable to compile and install Mxnet1.5 with tensorrt on the jetson nano，Is there someone have compile it, please help me. Thank you. - #27 by AastaLLL

The end of the install process freezes up my nano, when testing tensorrt with mxnet, much like before. It said the process was killed in the install window. And, the python test script still does not work. Is this an issue with my nano only? Do I have a defective item? If so how do I get that replaced?

AastaLLL · August 13, 2020, 4:28am

Hi,

Is reflash an option for you?
If yes, we recommend to reflash the Nano for a clean environment, and run the auto installer again:
https://github.com/AastaNV/JEP/blob/master/MXNET/autoinstall_mxnet.sh

Sometime the installer may not work at the first time due to apt-get update issue.
You may need to try it twice.

Thanks.

benjamin72115 · August 13, 2020, 12:26pm

Yes, I already tried that and it has no effect. Mxnet installs properly and I’m able to infer via cpu, it just crashes once I try to infer via gpu

Topic		Replies	Views
Mxnet installation for Jetson Nano Jetson Nano neural-network-framework	5	1586	October 18, 2021
I was unable to compile and install Mxnet1.5 with tensorrt on the jetson nano，Is there someone have compile it, please help me. Thank you. Jetson Nano	34	6355	October 14, 2021
I was unable to compile and install MXNET on the jetson nano，Is there an official installation tutorial？ Jetson Nano	45	13055	October 14, 2021
Keras MobileNets .h5 model inference on Jetson Nano: GPU is 10x slower than CPU Jetson Nano	3	1579	October 15, 2021
Mxnet Image Classiciation TensorRT Jetson Projects	0	517	July 25, 2021
Problem trying to install MXNet and GluonCV on Jetson Nano Jetson Nano neural-network-framework	10	2993	October 15, 2021
How to increase inference speed on JETSON NANO (4GB) Jetson Nano opencv , jetson-inference , deep-learning	5	2503	October 15, 2021
Deep Learning Inference Benchmarking Instructions Jetson Nano	1	832	March 16, 2020
Installation of MXNET on Jetson Nano Jetson Nano installation	3	2406	October 15, 2021
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1254	August 27, 2020

Speed up inference time on Nano with mxnet

Related topics