TensorRT: Error[3]: [topKLayer.h::setK::22] Error Code 3: API Usage Error

Description

I’m trying to convert an onnx model that was exported from tensorflow 2.11to a TRT engine.
I followed this colab

I then downloaded and used tf2onnx to create an .onnx file as follow

python -m tf2onnx.convert --saved-model content/exported_model --output test.onnx --verbose

Then I attempt to convert the test.onnx model to TRT engine using the following command

/usr/src/tensorrt/bin/trtexec --onnx='/workspaces/scratch_ai/onnx/test.onnx' --saveEngine='/workspaces/scratch_ai/trt/test.engine' --exportProfile='/workspaces/scratch_ai/trt/test.json' --allowGPUFallback --useSpinWait --separateProfileRun > '/workspaces/scratch_ai/trt/test.log'

and got the following print in terminal

[03/20/2023-09:42:32] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/20/2023-09:42:32] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[03/20/2023-09:42:33] [E] Error[3]: [topKLayer.h::setK::22] Error Code 3: API Usage Error (Parameter check failed at: /_src/build/aarch64-gnu/release/optimizer/api/layers/topKLayer.h::setK::22, condition: k > 0 && k <= kMAX_TOPK_K
)
[03/20/2023-09:42:33] [E] Error[2]: [topKLayer.cpp::TopKLayer::20] Error Code 2: Internal Error (Assertion ThreadContext::getThreadResources().getErrorRecorder().getNbErrors() == prevNbErrors failed. )
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:726: While parsing node number 310 [TopK -> "StatefulPartitionedCall/generate_detections/TopKV2:0"]:
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:727: --- Begin node ---
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:728: input: "StatefulPartitionedCall/generate_detections/Reshape:0"
input: "const_fold_opt__1545"
output: "StatefulPartitionedCall/generate_detections/TopKV2:0"
output: "StatefulPartitionedCall/generate_detections/TopKV2:1"
name: "StatefulPartitionedCall/generate_detections/TopKV2"
op_type: "TopK"
attribute {
  name: "sorted"
  i: 1
  type: INT
}

[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:729: --- End node ---
[03/20/2023-09:42:33] [E] [TRT] ModelImporter.cpp:731: ERROR: builtin_op_importers.cpp:4931 In function importTopK:
[8] Assertion failed: layer && "Failed to add TopK layer."
[03/20/2023-09:42:33] [E] Failed to parse onnx file
[03/20/2023-09:42:33] [E] Parsing model failed
[03/20/2023-09:42:33] [E] Failed to create engine from model or file.
[03/20/2023-09:42:33] [E] Engine set up failed

Environment

TensorRT Version: 8.5
GPU Type: Orin AGX
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: Docker Image nvcr.io/nvidia/l4t-tensorrt:r8.5.2.2-devel
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

test.onnx

Steps To Reproduce

I ran the following command in terminal

/usr/src/tensorrt/bin/trtexec --onnx='/workspaces/scratch_ai/onnx/test.onnx' --saveEngine='/workspaces/scratch_ai/trt/test.engine' --exportProfile='/workspaces/scratch_ai/trt/test.json' --allowGPUFallback --useSpinWait --separateProfileRun > '/workspaces/scratch_ai/trt/test.log'

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,

The below links might be useful for you.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!