builder->buildCudaEngine(*network) fails with "...Trying to find reduced divisor for 0"

Description

Hi,
I created a project with own plugins based on gridAnchor_TRT and NMS_TRT. Everything works as expected on Windows but not on Ubuntu. Surprisingly while building engine this error appears: terminate called after throwing an instance of ‘std::invalid_argument’
what(): Trying to find reduced divisor for 0

The same code works fine on windows (with the same TRT and CUDA versions).
I thought the error could be caused by one of the cuda kernels that I implemented for my plugins. But as far as I know plugin’s method enqueue() is only called while execution of context->enqueueV2()(or executeV2()) but not while building the engine.
Do you have any suggestion? Because I have no idea how to localize this error.

Environment

TensorRT Version7.0.0.11:
**GPU TypeRTX2080:
**Nvidia Driver Version440:
**CUDA Version10.2:
**CUDNN Version7.6.1:
**Operating System + VersionUbuntu 18.04:
Python Version (if applicable):
TensorFlow Version (if applicable):
**PyTorch Version (if applicable)1.2:
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Can you please share the sample script and model file to reproduce the issue so we can help better?

Thanks

Hi,
Project was tested in nvcr.io/nvidia/pytorch 20.03-py3 docker container PyTorch Release Notes :: NVIDIA Deep Learning Frameworks Documentation
No additional dependencies are required, except those already presented in project.
link to github repository:
GitHub - freetown113/faceDetection: Face detector
Thank you

It seems liballClassNMS.so lib is missing in the github repo. Getting following error:
…/bin/libil …/image
…/bin/libil: error while loading shared libraries: liballClassNMS.so: cannot open shared object file: No such file or directory

Thanks

Please check in “faceDetection/Plugins/cuda/lib” it should be there. I apologize, you need to add a path to these libraries into environmental variables.
“export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/faceDetection/Plugins/cuda/lib”

1 Like

Will analyze the code and update you accordingly.

But model seems to be working fine using “trtexec” command, will recommend to use “trtexec” command for now:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
Using trtexec:
trtexec --onnx=model.onnx --explicitBatch --verbose --saveEngine=test.trt
[05/26/2020-10:06:09] [I] percentile: 1.97314 ms at 99%
[05/26/2020-10:06:09] [I] total compute time: 2.9944 s
&&&& PASSED TensorRT.trtexec # trtexec --onnx=model.onnx --explicitBatch --verbose --saveEngine=test.trt

Thanks

Absolutely, the model by itself works fine. Nevertheless, it is useless without plugins. And the problem appears when I try to add plugins.
Thanks