tiny-tensorrt: a simple, efficient, easy-to-use TensorRT wrapper for cnn,sopport c++ and python

hello guys, I make a small project that make tensorrt much more easy-to-use, I hope it might be useful to you, and I also need someone can help me test it in embedded device, eg, jetson nano or jetpack. I want to make it better. and it you have any opinions or suggestion, welcome bring me an issue :), here is the link: https://github.com/zerollzeng/tiny-tensorrt


a simple, efficient, easy-to-use nvidia TensorRT wrapper for cnn with c++ and python api,sopport caffe and onnx format models.


TensorRT release it’s 6.x version, I upgrade tiny-tensorrt with it, so the old 5.x version was in trt- branch.


  • custom onnx model output :fire::fire::fire: —2019.10.18
  • upgrade with TensorRT — 2019.9.29
  • support more model and layer --working on
  • caffe model support
  • PRELU support
  • upsample support
  • engine serialization
  • caffe model int8 support
  • onnx support
  • python api support
  • maybe a handing calibrator data creating tool
  • test in nvidia p4
  • set device

System Requirements

cuda 10.0+


for python api, python 2.x/3.x and numpy in needed

this project is fully test with TensorRT, cuda 10.0, ubuntu 16.04. I test it with 1060ti, 1050ti, 1080ti, 1660ti, 2080, 2080ti and p4.

Quick start

prepare environment with official docker image

you need to install TensorRT at first, see here

# build in docker
mkdir build && cd build && cmake .. && make

then you can intergrate it into your own project with libtinytrt.so and Trt.h, for python module, you get pytrt.so

use tiny-tensorrt with c++

#include "Trt.h"

Trt trt;
// create engine and running context, note that engine file is device specific, so don't copy engine file to new device, it may cause crash
                 "pathto/engineFile", // since build engine is time consuming,so save we can serialize engine to file, it's much more faster
// trt.CreateEngine(onnxModelPath,engineFile,maxBatchSize); // for onnx model

// you might need to do some pre-processing in input such as normalization, it depends on your model.
trt.DataTransfer(input,0,True); // 0 for input index, you can get it from CreateEngine phase log output, True for copy input date to gpu

//run model, it will read your input and run inference. and generate output.

//  get output.
trt.DataTransfer(output, outputIndex, False) // you can get outputIndex in CreateEngine phase
// them you can do post processing in output

use tiny-tensorrt with python

import sys
import pytrt

trt = pytrt.Trt()
trt.CreateEngine(prototxt, caffemodel, engineFile, outputBlobName, calibratorData, maxBatchSize, mode)
# trt.CreateEngine(onnxModelPath, engineFile, maxBatchSize)
# see c++ CreateEngine

trt.DoInference(input_numpy_array) # slightly different from c++
output_numpy_array = trt.GetOutput(outputIndex)
# post processing

also see tensorrt-zoo, it implement some common computer vision model with tiny tensor_rt, it has serveral good samples

Support layer

  • upsample with custom scale, under test with yolov3.
  • yolo-det, last layer of yolov3 which sum three scales output and generate final result for nms. under test with yolov3.
  • PRELU, under test with openpose