PwgenException when building a cuda engine

Hello,

I am currently attempting to build a cuda engine from a network in ONXX format and need some help. The network contains 3d convolutions, 2d convolutions, residual connections, dropout, and elu activations. It is about 16MB serialized. It executes in Tensorflow and exports to ONNX format without issue.

However when I call buildEngineWithConfig() I am encountering an error. Graph construction and optimization completes successfully, but after a few minutes of autotuning the program crashes with this error:

terminate called after throwing an instance of 'pwgen::PwgenException'
  what():  Driver error:

There is nothing else in the logs except normal timing info showing the fastest tactics.

Here is the code I’m using to build the engine:

nvinfer1::IBuilder* nvbuilder = nvinfer1::createInferBuilder(logger);
nvinfer1::INetworkDefinition* nvnetwork = nvbuilder->createNetworkV2(1U << (int)nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
nvbuilder->setMaxBatchSize(2);

nvonnxparser::IParser* nvparser = nvonnxparser::createParser(*nvnetwork, logger);
nvparser->parseFromFile(onnxFilename.c_str(), 0);

nvinfer1::IBuilderConfig* config = nvbuilder->createBuilderConfig();
config->setFlags(1U << (int)nvinfer1::BuilderFlag::kFP16 | 1U << (int)nvinfer1::BuilderFlag::kDISABLE_TIMING_CACHE);
config->setMaxWorkspaceSize(1 << 30);
config->setProfilingVerbosity(nvinfer1::ProfilingVerbosity::kVERBOSE);

nvinfer1::ICudaEngine* nvengine = nvbuilder->buildEngineWithConfig(*nvnetwork, *config);

I have tried various workspace sizes, toggling fp16/fp32, and various network architectures. Some architectures build but most are failing with the above error.

Am I doing something wrong? Am I maybe missing a kernel module? How can I further debug the driver error?

Thanks for any help.

Environment info:

  • Jetson Nano Developer Kit (945-13450-0000-100)
  • JetPack 4.4
  • TensorFlow 2.2.0
  • tf2onnx 1.6.3 (opset 8)

After some trial and error I was able to narrow an apparent cause down to division by scalar operation after a Conv2D layer with activation. I removed the division and the error goes away.

For reference, here’s a TF-Keras model that is consistently producing the error for me:

input_layer = tf.keras.layers.Input(batch_shape=[batch_size, 3, input_height, input_width], dtype=tf.float32)

min = 0.5
x1 = tf.keras.layers.Conv2D(16, 5, padding="same", data_format="channels_first")(input_layer)
x2 = tf.keras.layers.Conv2D(1, 3, padding="same", data_format="channels_first", activation="sigmoid")(x1) / min
concat = tf.concat([x1, x2], axis=1)
x3 = tf.keras.layers.Conv2D(16, 5, padding="same", data_format="channels_first")(concat)

model = tf.keras.Model([input_layer], [x2, x3])

Is this a bug?

Hi,

May I know which JetPack 4.4 do you use?
Is it DP (developer preview) or GA (product release)?

A simple way is to check TensorRT version via this command:

$ cat /usr/include/aarch64-linux-gnu/NvInferVersion.h

GA includes TensorRT v7.1.3.

There is an issue on pwgen that can cause the issue you shared.
And the fix is available in the GA version already.

Thanks.

Hello, thank you for your reply. I am using the GA release with TensorRT version 7.1.3

Hi,

We try to reproduce this issue but the onnx model works good in our environment.
Here are our steps for your reference:

1. test.py

import tensorflow as tf
import keras2onnx
import onnx

batch_size = 1
input_height = 224
input_width  = 224

input_layer = tf.keras.layers.Input(batch_shape=[batch_size, 3, input_height, input_width], dtype=tf.float32)

min = 0.5
x1 = tf.keras.layers.Conv2D(16, 5, padding="same", data_format="channels_first")(input_layer)
x2 = tf.keras.layers.Conv2D(1, 3, padding="same", data_format="channels_first", activation="sigmoid")(x1) / min
concat = tf.concat([x1, x2], axis=1)
x3 = tf.keras.layers.Conv2D(16, 5, padding="same", data_format="channels_first")(concat)

model = tf.keras.Model([input_layer], [x2, x3])
onnx_model = keras2onnx.convert_keras(model, model.name)
onnx.save_model(onnx_model, "output.onnx")

2. Test with trtexec:

$ python3 test.py
$ /usr/src/tensorrt/bin/trtexec --onnx=output.onnx

The model works good with trtexec.
It’s recommended to try your model with trtexec also.

3. Here is the library version of our environment:

$ pip3 freeze

Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
keras2onnx==1.7.0
onnx==1.7.0
onnxconverter-common==1.7.0
tensorflow==1.15.3+nv20.7

Thanks.

1 Like

Great. Everything works perfectly when I boot up Ubuntu and build the engine in a desktop environment. I am not typically using the Nano this way. I think I’ll get an extra dev kit to use as a build server going forward. Thank you very much for your help.

FYI: I experienced similar problem when parsing HRNet using pytorch ( involved with conv, relu and up/down sampling ).
I was using cuda-10.2 running on nvidia driver 450 ( I have cuda 11 installed on my computer).
The problem is solved by downgrading my driver to 440, the one attached to the cuda-10.2 .run local install file.

1 Like