hello guys, I make a small project that make tensorrt much more easy-to-use, I hope it might be useful to you, and I also need someone can help me test it in embedded device, eg, jetson nano or jetpack. I want to make it better. and it you have any opinions or suggestion, welcome bring me an issue :), here is the link: https://github.com/zerollzeng/tiny-tensorrt
a simple, efficient, easy-to-use nvidia TensorRT wrapper for cnn with c++ and python api,sopport caffe and onnx format models.
TensorRT release it’s 6.x version, I upgrade tiny-tensorrt with it, so the old 5.x version was in trt-184.108.40.206 branch.
- custom onnx model output :fire::fire::fire: —2019.10.18
- upgrade with TensorRT 220.127.116.11 — 2019.9.29
- support more model and layer --working on
- caffe model support
- PRELU support
- upsample support
- engine serialization
- caffe model int8 support
- onnx support
- python api support
- maybe a handing calibrator data creating tool
- test in nvidia p4
- set device
for python api, python 2.x/3.x and numpy in needed
this project is fully test with TensorRT 18.104.22.168, cuda 10.0, ubuntu 16.04. I test it with 1060ti, 1050ti, 1080ti, 1660ti, 2080, 2080ti and p4.
prepare environment with official docker image
you need to install TensorRT at first, see here
# build in docker mkdir build && cd build && cmake .. && make
then you can intergrate it into your own project with libtinytrt.so and Trt.h, for python module, you get pytrt.so
use tiny-tensorrt with c++
#include "Trt.h" Trt trt; // create engine and running context, note that engine file is device specific, so don't copy engine file to new device, it may cause crash trt.CreateEngine("pathto/sample.prototxt", "pathto/sample.caffemodel", "pathto/engineFile", // since build engine is time consuming,so save we can serialize engine to file, it's much more faster "outputblob", calibratorData, maxBatchSize runMode); // trt.CreateEngine(onnxModelPath,engineFile,maxBatchSize); // for onnx model // you might need to do some pre-processing in input such as normalization, it depends on your model. trt.DataTransfer(input,0,True); // 0 for input index, you can get it from CreateEngine phase log output, True for copy input date to gpu //run model, it will read your input and run inference. and generate output. trt.Forward(); // get output. trt.DataTransfer(output, outputIndex, False) // you can get outputIndex in CreateEngine phase // them you can do post processing in output
use tiny-tensorrt with python
import sys sys.path.append("path/to/pytrt.so") import pytrt trt = pytrt.Trt() trt.CreateEngine(prototxt, caffemodel, engineFile, outputBlobName, calibratorData, maxBatchSize, mode) # trt.CreateEngine(onnxModelPath, engineFile, maxBatchSize) # see c++ CreateEngine trt.DoInference(input_numpy_array) # slightly different from c++ output_numpy_array = trt.GetOutput(outputIndex) # post processing
also see tensorrt-zoo, it implement some common computer vision model with tiny tensor_rt, it has serveral good samples
- upsample with custom scale, under test with yolov3.
- yolo-det, last layer of yolov3 which sum three scales output and generate final result for nms. under test with yolov3.
- PRELU, under test with openpose