hello guys, I make a small project that make tensorrt much more easy-to-use, I hope it might be useful to you, and I also need someone can help me test it in embedded device, eg, jetson nano or jetpack. I want to make it better. and it you have any opinions or suggestion, welcome bring me an issue :), here is the link: GitHub - zerollzeng/tiny-tensorrt: Deploy your model with TensorRT quickly. 快速使用TensorRT来部署模型
tiny-tensorrt
a simple, efficient, easy-to-use nvidia TensorRT wrapper for cnn with c++ and python api,sopport caffe and onnx format models.
Note
TensorRT release it’s 6.x version, I upgrade tiny-tensorrt with it, so the old 5.x version was in trt-5.1.5.0 branch.
Roadmap
- custom onnx model output :fire::fire::fire: —2019.10.18
- upgrade with TensorRT 6.0.1.5 — 2019.9.29
- support more model and layer --working on
- caffe model support
- PRELU support
- upsample support
- engine serialization
- caffe model int8 support
- onnx support
- python api support
- maybe a handing calibrator data creating tool
- test in nvidia p4
- set device
System Requirements
cuda 10.0+
TensorRT
for python api, python 2.x/3.x and numpy in needed
this project is fully test with TensorRT 5.1.5.0, cuda 10.0, ubuntu 16.04. I test it with 1060ti, 1050ti, 1080ti, 1660ti, 2080, 2080ti and p4.
Quick start
prepare environment with official docker image
you need to install TensorRT at first, see here
# build in docker
mkdir build && cd build && cmake .. && make
then you can intergrate it into your own project with libtinytrt.so and Trt.h, for python module, you get pytrt.so
use tiny-tensorrt with c++
#include "Trt.h"
Trt trt;
// create engine and running context, note that engine file is device specific, so don't copy engine file to new device, it may cause crash
trt.CreateEngine("pathto/sample.prototxt",
"pathto/sample.caffemodel",
"pathto/engineFile", // since build engine is time consuming,so save we can serialize engine to file, it's much more faster
"outputblob",
calibratorData,
maxBatchSize
runMode);
// trt.CreateEngine(onnxModelPath,engineFile,maxBatchSize); // for onnx model
// you might need to do some pre-processing in input such as normalization, it depends on your model.
trt.DataTransfer(input,0,True); // 0 for input index, you can get it from CreateEngine phase log output, True for copy input date to gpu
//run model, it will read your input and run inference. and generate output.
trt.Forward();
// get output.
trt.DataTransfer(output, outputIndex, False) // you can get outputIndex in CreateEngine phase
// them you can do post processing in output
use tiny-tensorrt with python
import sys
sys.path.append("path/to/pytrt.so")
import pytrt
trt = pytrt.Trt()
trt.CreateEngine(prototxt, caffemodel, engineFile, outputBlobName, calibratorData, maxBatchSize, mode)
# trt.CreateEngine(onnxModelPath, engineFile, maxBatchSize)
# see c++ CreateEngine
trt.DoInference(input_numpy_array) # slightly different from c++
output_numpy_array = trt.GetOutput(outputIndex)
# post processing
also see tensorrt-zoo, it implement some common computer vision model with tiny tensor_rt, it has serveral good samples
Support layer
- upsample with custom scale, under test with yolov3.
- yolo-det, last layer of yolov3 which sum three scales output and generate final result for nms. under test with yolov3.
- PRELU, under test with openpose