I am currently attempting to build a cuda engine from a network in ONXX format and need some help. The network contains 3d convolutions, 2d convolutions, residual connections, dropout, and elu activations. It is about 16MB serialized. It executes in Tensorflow and exports to ONNX format without issue.
However when I call buildEngineWithConfig() I am encountering an error. Graph construction and optimization completes successfully, but after a few minutes of autotuning the program crashes with this error:
terminate called after throwing an instance of 'pwgen::PwgenException'
what(): Driver error:
There is nothing else in the logs except normal timing info showing the fastest tactics.
I have tried various workspace sizes, toggling fp16/fp32, and various network architectures. Some architectures build but most are failing with the above error.
Am I doing something wrong? Am I maybe missing a kernel module? How can I further debug the driver error?
After some trial and error I was able to narrow an apparent cause down to division by scalar operation after a Conv2D layer with activation. I removed the division and the error goes away.
For reference, here’s a TF-Keras model that is consistently producing the error for me:
Great. Everything works perfectly when I boot up Ubuntu and build the engine in a desktop environment. I am not typically using the Nano this way. I think I’ll get an extra dev kit to use as a build server going forward. Thank you very much for your help.
FYI: I experienced similar problem when parsing HRNet using pytorch ( involved with conv, relu and up/down sampling ).
I was using cuda-10.2 running on nvidia driver 450 ( I have cuda 11 installed on my computer).
The problem is solved by downgrading my driver to 440, the one attached to the cuda-10.2 .run local install file.