Error occurred while running the Tensorrt samples: [reformat.cpp::executeCutensor::385]

When I run the TensorRT samples, the following error occurs:

&&&& RUNNING TensorRT.sample_onnx_mnist [TensorRT v8203] # ./sample_onnx_mnist
[11/15/2022-03:08:27] [I] Building and running a GPU inference engine for Onnx MNIST
[11/15/2022-03:08:28] [I] [TRT] [MemUsageChange] Init CUDA: CPU +12, GPU +0, now: CPU 22, GPU 219 (MiB)
[11/15/2022-03:08:28] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 22 MiB, GPU 219 MiB
[11/15/2022-03:08:28] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 94 MiB, GPU 219 MiB
[11/15/2022-03:08:28] [I] [TRT] ----------------------------------------------------------------
[11/15/2022-03:08:28] [I] [TRT] Input filename: …/data/mnist/mnist.onnx
[11/15/2022-03:08:28] [I] [TRT] ONNX IR version: 0.0.3
[11/15/2022-03:08:28] [I] [TRT] Opset version: 8
[11/15/2022-03:08:28] [I] [TRT] Producer name: CNTK
[11/15/2022-03:08:28] [I] [TRT] Producer version: 2.5.1
[11/15/2022-03:08:28] [I] [TRT] Domain: ai.cntk
[11/15/2022-03:08:28] [I] [TRT] Model version: 1
[11/15/2022-03:08:28] [I] [TRT] Doc string:
[11/15/2022-03:08:28] [I] [TRT] ----------------------------------------------------------------
[11/15/2022-03:08:28] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/15/2022-03:08:30] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +270, GPU +110, now: CPU 374, GPU 337 (MiB)
[11/15/2022-03:08:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +111, GPU +54, now: CPU 485, GPU 391 (MiB)
[11/15/2022-03:08:30] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/15/2022-03:08:30] [E] [TRT] 1: [reformat.cpp::executeCutensor::385] Error Code 1: CuTensor (Internal cuTensor permutate execute failed)
[11/15/2022-03:08:30] [E] [TRT] 1: [checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (no kernel image is available for execution on the device)
[11/15/2022-03:08:30] [E] [TRT] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
&&&& FAILED TensorRT.sample_onnx_mnist [TensorRT v8203] # ./sample_onnx_mnist

I have run the examples on multiple versions of drivers, cuda, and tensorRT, and they all prompt the above error.


TensorRT Version:
CPU Architecture: aarch64
GPU Type: GTX 1060
Nvidia Driver Version: 510
Operating System + Version: Ubuntu 20.04

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging