ONNX model with Jetson-Inference using GPU

Hi,

I retrained a model (ssd mobilenet v1) using Jetson-inference and pytorch (https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-ssd.md), and then I generated a ONNX file (for person detection).

But when trying to run this model with jetson.inference.detectNet in python (I made some change in the source code to use the GPU, with FP16 => working well with original ssd_mobilenet_v2_coco.uff), tensorRT doesn’t want to run the inferences with the ONNX model (I also tried INT8 and FP32 without success) :

[TRT] device GPU, completed writing engine cache to /usr/local/bin/networks/SSD-Mobilenet-v1-ONNX/ssd-mobilenet.onnx.1.0.7100.GPU.FP1 6.engine
[TRT] device GPU, loaded /usr/local/bin/networks/SSD-Mobilenet-v1-ONNX/ssd-mobilenet.onnx
[TRT] Deserialize required 123757 microseconds.
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] – layers 97
[TRT] – maxBatchSize 1
[TRT] – workspace 0
[TRT] – deviceMemory 20092416
[TRT] – bindings 3
[TRT] binding 0
– index 0
– name ‘input_0’
– type FP32
– in/out INPUT
– # dims 4
– dim #0 1 (SPATIAL)
– dim #1 3 (SPATIAL)
– dim #2 300 (SPATIAL)
– dim #3 300 (SPATIAL)
[TRT] binding 1
– index 1
– name ‘scores’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 3000 (SPATIAL)
– dim #2 2 (SPATIAL)
[TRT] binding 2
– index 2
– name ‘boxes’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (SPATIAL)
– dim #1 3000 (SPATIAL)
– dim #2 4 (SPATIAL)
[TRT]
[TRT] INVALID_ARGUMENT: Cannot find binding of given name: Input
[TRT] failed to find requested input layer Input in network
[TRT] device GPU, failed to create resources for CUDA engine
[TRT] failed to create TensorRT engine for /usr/local/bin/networks/SSD-Mobilenet-v1-ONNX/ssd-mobilenet.onnx, device GPU
[TRT] detectNet – failed to initialize.

Any idea of what is wrong ? The model runs successfully with detectnet when using the cpp version but uses the CPU instead of GPU

detectnet --model=models/Person/ssd-mobilenet.onnx --labels=models/Person/labels.txt
–input-blob=input_0 --output-cvg=scores --output-bbox=boxes
“images/*.jpg” test_Person

Files available at :

Hmm it doesn’t seem to be recieving/parsing your custom command-line arguments.

When you run that command line in the terminal, are there line breaks? Can you try running it all on one line?

I didn’t run the python script, I implemented the function in another script with no args (values to pass are in the script).

I also made some changes in the library, I will upload the files soon.

Now I call the functions with :
labels = open(“jetson-inference/data/networks/SSD-Mobilenet-v1-ONNX/labels.txt”).readlines()
net = jetson.inference.detectNet(“ssd-mobilenet-v1-onnx”, threshold=0.7, precision=“FP16”, device=“GPU”, allowGPUFallback=True)

These are the changes I made in the library :

Changes in PyDetectNet.cpp :

// Init
static int PyDetectNet_Init( PyDetectNet_Object* self, PyObject *args, PyObject *kwds )
{
LogDebug(LOG_PY_INFERENCE “PyDetectNet_Init()\n”);

// parse arguments
PyObject* argList = NULL;
const char* network = “ssd-mobilenet-v2”;
float threshold = DETECTNET_DEFAULT_THRESHOLD;

const char* precision = “FP16”;
// precisionType PrecisionType=TYPE_FP32;

const char* device = “GPU”;
// deviceType DeviceType = DEVICE_GPU;

int allowGPUFallback = false;

static char* kwlist = {“network”, “threshold”, “precision”, “device”, “allowGPUFallback”, NULL};
// |sOf

if( !PyArg_ParseTupleAndKeywords(args, kwds, “|sfssp”, kwlist, &network, &threshold, &precision, &device, &allowGPUFallback))
{
PyErr_SetString(PyExc_Exception, LOG_PY_INFERENCE “detectNet.init() failed to parse args tuple”);
printf("%s\n", network);
printf("%f\n", threshold);
printf("%s\n", precision);
printf("%s\n", device);
// printf("%b\n", allowGPUFallback);
return -1;
}

LogVerbose(LOG_PY_INFERENCE “detectNet loading build-in network ‘%s’\n”, network);

// parse the selected built-in network
detectNet::NetworkType networkType = detectNet::NetworkTypeFromStr(network);

uint32_t maxBatchSize=DEFAULT_MAX_BATCH_SIZE;
precisionType precision_type = precisionTypeFromStr(precision);
deviceType device_type = deviceTypeFromStr(device);
// bool allowGPUFallback=true;

if( networkType == detectNet::CUSTOM )
{
PyErr_SetString(PyExc_Exception, LOG_PY_INFERENCE “detectNet invalid built-in network was requested”);
printf(LOG_PY_INFERENCE “detectNet invalid built-in network was requested (’%s’)\n”, network);
return -1;
}

// load the built-in network
// self->net = detectNet::Create(networkType, threshold, maxBatchSize, precision_type, device_type, allowGPUFallback);
self->net = detectNet::Create(networkType, threshold, maxBatchSize, precision_type, device_type, allowGPUFallback);

// confirm the network loaded
if( !self->net )
{
PyErr_SetString(PyExc_Exception, LOG_PY_INFERENCE “detectNet failed to load network”);
LogError(LOG_PY_INFERENCE “detectNet failed to load network\n”);
return -1;
}

self->base.net = self->net;
return 0;
}

changes in detectNet.cpp :

detectNet* detectNet::Create( NetworkType networkType, float threshold, uint32_t maxBatchSize,
precisionType precision, deviceType device, bool allowGPUFallback )
{
#if 1
if( networkType == PEDNET_MULTI )
return Create(“networks/multiped-500/deploy.prototxt”, “networks/multiped-500/snapshot_iter_178000.caffemodel”, 117.0f, “networks/multiped-500/class_labels.txt”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == FACENET )
return Create(“networks/facenet-120/deploy.prototxt”, “networks/facenet-120/snapshot_iter_24000.caffemodel”, 0.0f, “networks/facenet-120/class_labels.txt”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == PEDNET )
return Create(“networks/ped-100/deploy.prototxt”, “networks/ped-100/snapshot_iter_70800.caffemodel”, 0.0f, “networks/ped-100/class_labels.txt”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_AIRPLANE )
return Create(“networks/DetectNet-COCO-Airplane/deploy.prototxt”, “networks/DetectNet-COCO-Airplane/snapshot_iter_22500.caffemodel”, 0.0f, “networks/DetectNet-COCO-Airplane/class_labels.txt”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_BOTTLE )
return Create(“networks/DetectNet-COCO-Bottle/deploy.prototxt”, “networks/DetectNet-COCO-Bottle/snapshot_iter_59700.caffemodel”, 0.0f, “networks/DetectNet-COCO-Bottle/class_labels.txt”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_CHAIR )
return Create(“networks/DetectNet-COCO-Chair/deploy.prototxt”, “networks/DetectNet-COCO-Chair/snapshot_iter_89500.caffemodel”, 0.0f, “networks/DetectNet-COCO-Chair/class_labels.txt”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_DOG )
return Create(“networks/DetectNet-COCO-Dog/deploy.prototxt”, “networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel”, 0.0f, “networks/DetectNet-COCO-Dog/class_labels.txt”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
#if NV_TENSORRT_MAJOR > 4
else if( networkType == SSD_INCEPTION_V2 )
return Create(“networks/SSD-Inception-v2/ssd_inception_v2_coco.uff”, “networks/SSD-Inception-v2/ssd_coco_labels.txt”, threshold, “Input”, Dims3(3,300,300), “NMS”, “NMS_1”, maxBatchSize, precision, device, allowGPUFallback);
else if( networkType == SSD_MOBILENET_V1_ONNX )
return Create(“networks/SSD-Mobilenet-v1-ONNX/ssd-mobilenet.onnx”, “networks/SSD-Mobilenet-v1-ONNX/labels.txt”, threshold, “Input”, Dims3(3,300,300), “NMS”, “NMS_1”, maxBatchSize, precision, device, allowGPUFallback);
else if( networkType == SSD_MOBILENET_V1 )
return Create(“networks/SSD-Mobilenet-v1/ssd_mobilenet_v1_coco.uff”, “networks/SSD-Mobilenet-v1/ssd_coco_labels.txt”, threshold, “Input”, Dims3(3,300,300), “Postprocessor”, “Postprocessor_1”, maxBatchSize, precision, device, allowGPUFallback);
else if( networkType == SSD_MOBILENET_V2 )
return Create(“networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff”, “networks/SSD-Mobilenet-v2/ssd_coco_labels.txt”, threshold, “Input”, Dims3(3,300,300), “NMS”, “NMS_1”, maxBatchSize, precision, device, allowGPUFallback);
#endif
else
return NULL;
#else
if( networkType == PEDNET_MULTI )
return Create(“networks/multiped-500/deploy.prototxt”, “networks/multiped-500/snapshot_iter_178000.caffemodel”, “networks/multiped-500/mean.binaryproto”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == FACENET )
return Create(“networks/facenet-120/deploy.prototxt”, “networks/facenet-120/snapshot_iter_24000.caffemodel”, NULL, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == PEDNET )
return Create(“networks/ped-100/deploy.prototxt”, “networks/ped-100/snapshot_iter_70800.caffemodel”, “networks/ped-100/mean.binaryproto”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_AIRPLANE )
return Create(“networks/DetectNet-COCO-Airplane/deploy.prototxt”, “networks/DetectNet-COCO-Airplane/snapshot_iter_22500.caffemodel”, “networks/DetectNet-COCO-Airplane/mean.binaryproto”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_BOTTLE )
return Create(“networks/DetectNet-COCO-Bottle/deploy.prototxt”, “networks/DetectNet-COCO-Bottle/snapshot_iter_59700.caffemodel”, “networks/DetectNet-COCO-Bottle/mean.binaryproto”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_CHAIR )
return Create(“networks/DetectNet-COCO-Chair/deploy.prototxt”, “networks/DetectNet-COCO-Chair/snapshot_iter_89500.caffemodel”, “networks/DetectNet-COCO-Chair/mean.binaryproto”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else if( networkType == COCO_DOG )
return Create(“networks/DetectNet-COCO-Dog/deploy.prototxt”, “networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel”, “networks/DetectNet-COCO-Dog/mean.binaryproto”, threshold, DETECTNET_DEFAULT_INPUT, DETECTNET_DEFAULT_COVERAGE, DETECTNET_DEFAULT_BBOX, maxBatchSize, precision, device, allowGPUFallback );
else
return NULL;
#endif
}

// Create
detectNet* detectNet::Create( const commandLine& cmdLine )
{
detectNet* net = NULL;

// parse command line parameters
const char* modelName = cmdLine.GetString(“network”);

if( !modelName )
modelName = cmdLine.GetString(“model”, “ssd-mobilenet-v2”);

float threshold = cmdLine.GetFloat(“threshold”);

if( threshold == 0.0f )
threshold = DETECTNET_DEFAULT_THRESHOLD;

int maxBatchSize = cmdLine.GetInt(“batch_size”);

if( maxBatchSize < 1 )
maxBatchSize = DEFAULT_MAX_BATCH_SIZE;

const char* precisionName = cmdLine.GetString(“precision”);

if( !precisionName )
precisionName = cmdLine.GetString(“precision”, “FP16”);

// parse the model type
const precisionType type_precision = precisionTypeFromStr(precisionName);

const char* deviceName = cmdLine.GetString(“device”);

if( !deviceName )
deviceName = cmdLine.GetString(“device”, “GPU”);

// parse the device type
const deviceType type_device = deviceTypeFromStr(deviceName);

bool allowGPUFallback_value = cmdLine.GetBool(“allowGPUFallback”);

if( !allowGPUFallback_value )
allowGPUFallback_value = cmdLine.GetBool(“allowGPUFallback”, false);

// parse the model type
const detectNet::NetworkType type = NetworkTypeFromStr(modelName);

if( type == detectNet::CUSTOM )
{
const char* prototxt = cmdLine.GetString(“prototxt”);
const char* input = cmdLine.GetString(“input_blob”);
const char* out_blob = cmdLine.GetString(“output_blob”);
const char* out_cvg = cmdLine.GetString(“output_cvg”);
const char* out_bbox = cmdLine.GetString(“output_bbox”);
const char* class_labels = cmdLine.GetString(“class_labels”);

  if( !input ) 	
  	input = DETECTNET_DEFAULT_INPUT;

  if( !out_blob )
  {
  	if( !out_cvg )  out_cvg  = DETECTNET_DEFAULT_COVERAGE;
  	if( !out_bbox ) out_bbox = DETECTNET_DEFAULT_BBOX;
  }

  if( !class_labels )
  	class_labels = cmdLine.GetString("labels");

  float meanPixel = cmdLine.GetFloat("mean_pixel");

  net = detectNet::Create(prototxt, modelName, meanPixel, class_labels, threshold, input, 
  					out_blob ? NULL : out_cvg, out_blob ? out_blob : out_bbox, maxBatchSize);

}
else
{
// create detectNet from pretrained model
// net = detectNet::Create(type, threshold, maxBatchSize);
net = detectNet::Create(type, threshold, maxBatchSize, type_precision, type_device, allowGPUFallback_value);
}

if( !net )
return NULL;

// enable layer profiling if desired
if( cmdLine.GetFlag(“profile”) )
net->EnableLayerProfiler();

// set overlay alpha value
net->SetOverlayAlpha(cmdLine.GetFloat(“alpha”, DETECTNET_DEFAULT_ALPHA));

return net;
}

changes in detectNet.h :
Line 193

#if NV_TENSORRT_MAJOR > 4
SSD_MOBILENET_V1, /< SSD Mobilenet-v1 UFF model, trained on MS-COCO */
SSD_MOBILENET_V1_ONNX, /
< SSD Mobilenet-v1 ONNX model
SSD_MOBILENET_V2, /< SSD Mobilenet-v2 UFF model, trained on MS-COCO */
SSD_INCEPTION_V2 /
< SSD Inception-v2 UFF model, trained on MS-COCO */

changes in commandLine.h :

bool GetBool( const char* argName, bool defaultValue=false, bool allowOtherDelimiters=true ) const;

changes in commandLine.cpp :

// GetBool
bool commandLine::GetBool( const char* string_ref, bool default_value, bool allowOtherDelimiters ) const
{
if( argc < 1 )
return 0;

bool bFound = false;
bool value = false;

for( int i=ARGC_START; i < argc; i++ )
{
const int string_start = strFindDelimiter(’-’, argv[i]);

  if( string_start == 0 )
  	continue;
  
  const char* string_argv = &argv[i][string_start];
  const int length = (int)strlen(string_ref);

  if (!strncasecmp(string_argv, string_ref, length))
  {
  	if (length+1 <= (int)strlen(string_argv))
  	{
  		int auto_inc = (string_argv[length] == '=') ? 1 : 0;
  		value = atoi(&string_argv[length + auto_inc]);
  	}
  	else
  	{
  		value = false;
  	}

  	bFound = true;
  	continue;
  }

}

if( bFound )
return value;

if( !allowOtherDelimiters )
return default_value;

// try looking for the argument with delimiters swapped
char* swapped_ref = strSwapDelimiter(string_ref);

if( !swapped_ref )
return default_value;

value = GetInt(swapped_ref, default_value, false);
free(swapped_ref);
return value;
}

The modified library is available here :

To call the network :
net = jetson.inference.detectNet(“ssd-mobilenet-v1-onnx”, threshold=0.7, precision=“FP16”, device=“GPU”, allowGPUFallback=True)

@Pelepicier, I am unable to debug all the changes you made. I recommend going back to the original jetson-inference code and creating your model like this:

net = jetson.inference.detectNet(argv=['--model=my_model_path/ssd-mobilenet.onnx',
                                       '--labels=my_model_path/labels.txt',
                                       '--input-blob=input_0', '--output-cvg=scores', '--output-bbox=boxes',
                                       threshold=0.5)

This will use the parsing already in detectNet and should be working.

Hi @dusty_nv,

Thank you this is exactly what I needed.
Now I can get 250FPS with my custom retrained ONNX model with only “Person” label (thanks to your scripts in jetson-inference). I use GPU+DLA_0+DLA_1 with multiprocessing (in python), and just needed to changed 8 lines in c/detectNet.cpp to make it work (to pass the device and precision in args).
I’m trying to have a little bit more FPS, I was wondering : what does the batch_size parameter change in the engine ? It does not seems to have consequences on the inference time…

It only currently changes the max batch size that the TensorRT engine can support. It doesn’t actually do multi-image batching, as that would require additional pre/post-processing code and changes to the input streaming. I would recommend DeepStream for applications using multi-stream batching.

I miss a bracket when i use this. Are you missing this: ] ?