How Can I Create a Custom tf.split TensorRT Layer to run YOLO v3 Tiny on PX2 with DRIVEWORKS?


I am attempting to implement YOLOv3 Tiny on the PX2, but have been running into a lot of issues. Originally, I was trying to get Darknet and OpenCV working with the GSML cameras, but abandoned that route to try to work with NVMEDIA and DRIVEWORKS APIs instead. From my understanding, this can be accomplished by implementing the network in a supported framework, in this case TensorFlow, and then generating a TensorRT runtime engine for use with DRIVEWORKS. I came across an implementation of YOLOv3 Tiny on github ( and successfully trained the network on COCO and froze the graph. I’m currently stuck at the stage of trying to generate a UFF model from this frozen graph as I am getting the following error:

Traceback (most recent call last):
  File "./", line 3, in <module>
    uff.from_tensorflow_frozen_model("./output_graph.pb", output_nodes=["TRAINER/h"], preprocessor=None, output_filename="yolov3-tiny.uff")
  File "/home/reach/.local/lib/python3.6/site-packages/uff/converters/tensorflow/", line 149, in from_tensorflow_frozen_model
    return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
  File "/home/reach/.local/lib/python3.6/site-packages/uff/converters/tensorflow/", line 62, in from_tensorflow
  File "/home/reach/.local/lib/python3.6/site-packages/graphsurgeon/", line 112, in process_softmax
    node_chains = dynamic_graph.find_node_chains_by_op(op_chain)
  File "/home/reach/.local/lib/python3.6/site-packages/graphsurgeon/", line 239, in find_node_chains_by_op
    matching_chain = find_matching_chain(node, chain)
  File "/home/reach/.local/lib/python3.6/site-packages/graphsurgeon/", line 224, in find_matching_chain
    input_node = self.node_map[input_name]
KeyError: 'TRAINER/split:1'
Makefile:10: recipe for target 'uff' failed
make: *** [uff] Error 1

After scouring the forums it would appear that this is an issue with the tf.split layers in the graph as these are not currently supported (per this post: The moderator did give me one glimmer of hope still as they said that unsupported layers could be implemented in TensorRT with NVIDIA’s Plugin API. I did spend a bit of time looking at the documentation and examples, but I still don’t really understand how I would go about implementing this through the Plugin API. It seems like it should be relatively simple as all this would be doing is splitting the input tensor into multiple output tensors, however I’m not really sure where I should begin. I guess what I am looking for is confirmation as to whether I am even on the right track as well as a nudge in the right direction for figuring out how to implement this layer myself. Does NVIDIA (or anyone on the forum) have tutorials that could walk me through implementing a custom TensorRT layer for TensorFlow? Thanks everyone for your time and assistance.

Hi maxe2470,

Please take a look at [url][/url] to see if helps.

Dear maxe2470,
If you are using DrivePX2, the latest PDK has driveworks 1.2 which does not support tensorRT customer layer plugins. Note that we have no more releases targeted on DrivePX2.
But on Drive AGX platforms, the latest PDK has driveworks 1.5 which supports tensorRT custom layer plugin. We have a sample(sample_dnn_plugin) to demonstrate this. Also, we have included YOLOv3 ONNX based sample as part of TensorRT(
For documentation on implementing custom layers for TensorRT, please check and look at our TensorRT samples


Thank you for you reply to my question and the link to the example.
I work with a Drive PX2 and want to run a TensorRT engine with custom layers on it, according to the information you provided in Figure 2 of the following link:


Please explicitly tell me whether I can use Drive PX2 for running TensorRT engines with custom layers


Dear maxe2470,
Yes. You can write custom plugin layers and create a tensorRT engine. Please check tensorRT samples on this. But note that this model can not be integrated into Driveworks object detector sample as custom layers are not supported in DW 1.2

Hi again,

Thank you for the quick response. That is good information to know.

To make sure I understand, then, there is no way to utilize a TensorRT engine with custom layers in DW 1.2 even if I didn’t use the object detector sample?

If this is the case, how am I able to work with GMSL cameras and YOLOv3 on Drive PX2?


Dear maxe2470,
In this case, You can check getting the dwImageCUDA object from camera and pass CUDA buffer as input to tensorRT to perform inference and find output bounding boxes. So you need to call TensorRT API calls to load your model, build network and inferencing inside DW sample.


I’m doing the same thing. I use this project.

It implement yolo detection by using tensorrt caffe parser, and loading the yolo-to-caffe model.

When I load yolo-to-caffe tiny model, it works well.
But when I load yolo-to-caffe 604 model(yolov3_608_trt.prototxt),it :

./install/runYolov3 --caffemodel=./yolov3_608.caffemodel --prototxt=./yolov3.prototxt --input=./test.jpg --W=608 --H=608 --class=80
####### input args#######
####### end args#######
init plugin proto: ./yolov3.prototxt caffemodel: ./yolov3_608.caffemodel
Begin parsing model…
ERROR: Parameter check failed at: …/builder/Network.cpp::addScale::120, condition: hasChannelDimension == true
error parsing layer type BatchNorm index 264
ERROR: ssd_error_log: Fail to parse
Segmentation fault (core dumped)

It must be something wrong with tensorrt caffe parser.
It should be work well on tensorrt version is on PX2).

Any idea?

Hi SivaRamaKrishna,

Do u mean that we can’t load custom layers model into object detector sample, but we cant use dwImageCUDA from gmsl as input to tensorRT to perform inference ?

thank u~

Dear Allenchen,
To use DNN model with DW APIs, it has to be generated using tensorRT_optimization tool. But we have provided including plugin layer as parameter to tool in DW 1.5 onwards where as the last release for DRIVE PX2 has DW 1.2.