Failed to alloc CUDA mapped memory for tensor input when load tiny_yolov2.onnx

Hi, all,
I have a difficulty to convert the customized trained the tiny_yolov2 model into tensorrt format. So I plan to test the tiny_yolov2.onnx model from
https:/github.com/onnx/models/tree/master/vision/object_detection_segmentation/tiny_yolov2

I create a “test” folder inside of “jetson-inference/data/networks/test/” and I put the onnx model inside together with the labels file;
Then, in the folder “/jetson-inference/python/examples”, I run the following command:

$ NET==/home/***/jetson-inference/data/networks/test/
$ python3 detectnet-console.py --model=$NET/Model.onnx --label=$NET/labels.txt  /home/***/jetson-inference/data/images/peds_1.jpg /home/***/output.jpg --input_blob=data –output_bbox=bboxes

But I get following error:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
jetson.inference.init.py
jetson.inference – initializing Python 3.6 bindings…
jetson.inference – registering module types…
jetson.inference – done registering module types
jetson.inference – done Python 3.6 binding initialization
jetson.utils.init.py
jetson.utils – initializing Python 3.6 bindings…
jetson.utils – registering module functions…
jetson.utils – done registering module functions
jetson.utils – registering module types…
jetson.utils – done registering module types
jetson.utils – done Python 3.6 binding initialization
[image] loaded ‘/home//jetson-inference/data/images/peds_1.jpg’ (1920 x 1080, 3 channels)
jetson.inference – PyTensorNet_New()
jetson.inference – PyDetectNet_Init()
jetson.inference – detectNet loading network using argv command line params
jetson.inference – detectNet.init() argv[0] = ‘detectnet-console.py’
jetson.inference – detectNet.init() argv[1] = '–model=/home/
/jetson-inference/data/networks/test/Model.onnx’
jetson.inference – detectNet.init() argv[2] = ‘–label=/home//jetson-inference/data/networks/test/labels.txt’
jetson.inference – detectNet.init() argv[3] = '/home/
/jetson-inference/data/images/peds_1.jpg’
jetson.inference – detectNet.init() argv[4] = ‘/home/***/output.jpg’
jetson.inference – detectNet.init() argv[5] = ‘–input_blob=data’
jetson.inference – detectNet.init() argv[6] = ‘–output_bbox=bboxes’

detectNet – loading detection network model from:
– prototxt NULL
– model /home/***/jetson-inference/data/networks/test/Model.onnx
– input_blob ‘data’
– output_cvg ‘coverage’
– output_bbox ‘bboxes’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels NULL
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins…
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - ONNX (extension ‘.onnx’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file /home//jetson-inference/data/networks/test/Model.onnx.1.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading /usr/bin/ /home/
/jetson-inference/data/networks/test/Model.onnx

Input filename: /home/***/jetson-inference/data/networks/test/Model.onnx
ONNX IR version: 0.0.5
Opset version: 8
Producer name: OnnxMLTools
Producer version: 1.5.2
Domain: onnxconverter-common
Model version: 0
Doc string: The Tiny YOLO network from the paper ‘YOLO9000: Better, Faster, Stronger’ (2016), arXiv:1612.08242

WARNING: ONNX model has a newer ir_version (0.0.5) than this parser was built against (0.0.3).
[TRT] scalerPreprocessor_scaled:Mul -> (3, 416, 416)
[TRT] image2:Add -> (3, 416, 416)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (3, 416, 416)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 16
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (16, 416, 416)
[TRT] convolution2d_1_output:Conv -> (16, 416, 416)
[TRT] batchnormalization_1_output:BatchNormalization -> (16, 416, 416)
[TRT] leakyrelu_1_output:LeakyRelu -> (16, 416, 416)
[TRT] maxpooling2d_1_output:MaxPool -> (16, 208, 208)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (16, 208, 208)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 32
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (32, 208, 208)
[TRT] convolution2d_2_output:Conv -> (32, 208, 208)
[TRT] batchnormalization_2_output:BatchNormalization -> (32, 208, 208)
[TRT] leakyrelu_2_output:LeakyRelu -> (32, 208, 208)
[TRT] maxpooling2d_2_output:MaxPool -> (32, 104, 104)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (32, 104, 104)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 64
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (64, 104, 104)
[TRT] convolution2d_3_output:Conv -> (64, 104, 104)
[TRT] batchnormalization_3_output:BatchNormalization -> (64, 104, 104)
[TRT] leakyrelu_3_output:LeakyRelu -> (64, 104, 104)
[TRT] maxpooling2d_3_output:MaxPool -> (64, 52, 52)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (64, 52, 52)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 128
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (128, 52, 52)
[TRT] convolution2d_4_output:Conv -> (128, 52, 52)
[TRT] batchnormalization_4_output:BatchNormalization -> (128, 52, 52)
[TRT] leakyrelu_4_output:LeakyRelu -> (128, 52, 52)
[TRT] maxpooling2d_4_output:MaxPool -> (128, 26, 26)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (128, 26, 26)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 256
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (256, 26, 26)
[TRT] convolution2d_5_output:Conv -> (256, 26, 26)
[TRT] batchnormalization_5_output:BatchNormalization -> (256, 26, 26)
[TRT] leakyrelu_5_output:LeakyRelu -> (256, 26, 26)
[TRT] maxpooling2d_5_output:MaxPool -> (256, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (256, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 512
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (512, 13, 13)
[TRT] convolution2d_6_output:Conv -> (512, 13, 13)
[TRT] batchnormalization_6_output:BatchNormalization -> (512, 13, 13)
[TRT] leakyrelu_6_output:LeakyRelu -> (512, 13, 13)
[TRT] maxpooling2d_6_output:MaxPool -> (512, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (512, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 1024
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (1024, 13, 13)
[TRT] convolution2d_7_output:Conv -> (1024, 13, 13)
[TRT] batchnormalization_7_output:BatchNormalization -> (1024, 13, 13)
[TRT] leakyrelu_7_output:LeakyRelu -> (1024, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (1024, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (3, 3), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 1024
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (1024, 13, 13)
[TRT] convolution2d_8_output:Conv -> (1024, 13, 13)
[TRT] batchnormalization_8_output:BatchNormalization -> (1024, 13, 13)
[TRT] leakyrelu_8_output:LeakyRelu -> (1024, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:771: Convolution input dimensions: (1024, 13, 13)
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:835: Using kernel: (1, 1), strides: (1, 1), padding: (0, 0), dilations: (1, 1), numOutputs: 125
[TRT] /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.1/parsers/onnxOpenSource/builtin_op_importers.cpp:836: Convolution output dimensions: (125, 13, 13)
[TRT] grid:Conv -> (125, 13, 13)
[TRT] retrieved Input tensor “image”: 3x416x416
[TRT] device GPU, configuring CUDA engine
[TRT] device GPU, building FP16: ON
[TRT] device GPU, building INT8: OFF
[TRT] device GPU, building CUDA engine (this may take a few minutes the first time a network is loaded)
[TRT] device GPU, completed building CUDA engine
[TRT] network profiling complete, writing engine cache to /home//jetson-inference/data/networks/test/Model.onnx.1.1.GPU.FP16.engine
[TRT] device GPU, completed writing engine cache to /home/
/jetson-inference/data/networks/test/Model.onnx.1.1.GPU.FP16.engine
[TRT] device GPU, /home/***/jetson-inference/data/networks/test/Model.onnx loaded
[TRT] device GPU, CUDA engine context initialized with 2 bindings
[TRT] binding – index 0
– name ‘image’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (CHANNEL)
– dim #1 416 (SPATIAL)
– dim #2 416 (SPATIAL)
[TRT] binding – index 1
– name ‘grid’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 125 (CHANNEL)
– dim #1 13 (SPATIAL)
– dim #2 13 (SPATIAL)
[TRT] binding to input 0 data binding index: -1
[TRT] binding to input 0 data dims (b=1 c=0 h=0 w=0) size=0
[TRT] failed to alloc CUDA mapped memory for tensor input, 0 bytes
detectNet – failed to initialize.
jetson.inference – detectNet failed to load built-in network ‘ssd-mobilenet-v2’
PyTensorNet_Dealloc()
Traceback (most recent call last):
File “detectnet-console.py”, line 51, in
net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold)
Exception: jetson.inference – detectNet failed to load network
jetson.utils – freeing CUDA mapped memory
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

II flashed the Jetson TX2 with latest sdkmanager, i.e.: “sdkmanager_0.9.14-4964_amd64.deb”.
Jetson: Jetson TX2, Pascal GPU with 256 CUDA-cores; 64-bit NVIDIA Denver and ARM Cortex-A57 CPUs; 8 GB LPDDR4 Memory; 32GB eMMC 5.1 Flash Storage; Graphics: NVIDIA Tegra X2 (nvgpu)/integrated,

OS: Ubuntu 18.04 LTS, 64-bit

TensorRT version: 5.1.6.1-1+cuda10.0

Python version: Python 3.6.8

Please help me to figure out this issue.
Thank you!

Moved to Jetson TX2 forum. Someone here should be able to help you.

Thanks,
NVIDIA Enterprise Support

Hi tairen, it appears from this part of the log, that the correct name of the input/output layers should be --input_blob=image --output_bbox=grid

[TRT] binding -- index 0
-- name 'image'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3 (CHANNEL)
-- dim #1 416 (SPATIAL)
-- dim #2 416 (SPATIAL)
[TRT] binding -- index 1
-- name 'grid'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 125 (CHANNEL)
-- dim #1 13 (SPATIAL)
-- dim #2 13 (SPATIAL)

That may allow you to load the network at least. However, I have not implemented the pre/post-processing for this YOLO ONNX model. That code would go here in the detectNet.cpp source code:

The ONNX pre/post-processing code that is there now was just a test I was doing with some simpler regression model from PyTorch, as I could not get full detection model working through ONNX with PyTorch yet. However I am interested to test this YOLO ONNX model you pointed to since TensorRT is able to parse it. It would probably be a couple weeks before I’m able to work on it though, sorry. In the meantime you are welcome to modify the code.