Inference of model using tensorflow/onnxruntime and TensorRT gives different result

Hi,

Sorry to miss the array validation shared above.

Based on your implementation:

context->execute(1, buffers);
output.download(result);

There is no synchronized mechanism between GPU tasks and CPU tasks.
So CPU may try to copy the buffer back before the inference job is done.

Would you mind to add a synchronization call in between to see if it helps first?

context->execute(1, buffers);
cudaDeviceSynchronize();
output.download(result);

Thanks.