Description
$ ./main image
ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (460) - Cuda Error in loadKernel: -1 (TensorRT internal error)
ERROR: INVALID_STATE: std::exception
ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.
Segmentation fault
the onnx file is ok in cuda10.0 + cudnn7.5 + TensorRT-5.1.5.0,
pytorch to onnx has no error/warning.
value of engine->getNbLayers()
in TensorRT-5.1.5.0 is 40
in TensorRT-7.1.3.4 is 35
Environment
**TensorRT Version7.1.3.4:
**GPU Type Tesla T4:
**Nvidia Driver Version 450.51.05:
**CUDA Version 11.0:
**CUDNN Version cudnn-11.0-linux-x64-v8.0.2.39:
**Operating System + Version ubuntu16.04
**Python Version (if applicable) python3.7:
TensorFlow Version (if applicable):
**PyTorch Version (if applicable) 1.6.0:
Baremetal or Container (if container which image + tag):
Relevant Files
Steps To Reproduce
logs:
ubuntu @ ~/tools/TensorRT-7.1.3.4/bin
$ ./trtexec --onnx=/tmp/test.onnx --shapes=input:32x3x160x96 --explicitBatch --workspace=1024 --fp16 --saveEngine=/tmp/test.engine
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=/tmp/test.onnx --shapes=input:32x3x160x96 --explicitBatch --workspace=1024 --fp16 --saveEngine=/tmp/test.engine
[10/14/2020-10:21:48] [I] === Model Options ===
[10/14/2020-10:21:48] [I] Format: ONNX
[10/14/2020-10:21:48] [I] Model: /tmp/test.onnx
[10/14/2020-10:21:48] [I] Output:
[10/14/2020-10:21:48] [I] === Build Options ===
[10/14/2020-10:21:48] [I] Max batch: explicit
[10/14/2020-10:21:48] [I] Workspace: 1024 MB
[10/14/2020-10:21:48] [I] minTiming: 1
[10/14/2020-10:21:48] [I] avgTiming: 8
[10/14/2020-10:21:48] [I] Precision: FP32+FP16
[10/14/2020-10:21:48] [I] Calibration:
[10/14/2020-10:21:48] [I] Safe mode: Disabled
[10/14/2020-10:21:48] [I] Save engine: /tmp/test.engine
[10/14/2020-10:21:48] [I] Load engine:
[10/14/2020-10:21:48] [I] Builder Cache: Enabled
[10/14/2020-10:21:48] [I] NVTX verbosity: 0
[10/14/2020-10:21:48] [I] Inputs format: fp32:CHW
[10/14/2020-10:21:48] [I] Outputs format: fp32:CHW
[10/14/2020-10:21:48] [I] Input build shape: input=32x3x160x96+32x3x160x96+32x3x160x96
[10/14/2020-10:21:48] [I] Input calibration shapes: model
[10/14/2020-10:21:48] [I] === System Options ===
[10/14/2020-10:21:48] [I] Device: 0
[10/14/2020-10:21:48] [I] DLACore:
[10/14/2020-10:21:48] [I] Plugins:
[10/14/2020-10:21:48] [I] === Inference Options ===
[10/14/2020-10:21:48] [I] Batch: Explicit
[10/14/2020-10:21:48] [I] Input inference shape: input=32x3x160x96
[10/14/2020-10:21:48] [I] Iterations: 10
[10/14/2020-10:21:48] [I] Duration: 3s (+ 200ms warm up)
[10/14/2020-10:21:48] [I] Sleep time: 0ms
[10/14/2020-10:21:48] [I] Streams: 1
[10/14/2020-10:21:48] [I] ExposeDMA: Disabled
[10/14/2020-10:21:48] [I] Spin-wait: Disabled
[10/14/2020-10:21:48] [I] Multithreading: Disabled
[10/14/2020-10:21:48] [I] CUDA Graph: Disabled
[10/14/2020-10:21:48] [I] Skip inference: Disabled
[10/14/2020-10:21:48] [I] Inputs:
[10/14/2020-10:21:48] [I] === Reporting Options ===
[10/14/2020-10:21:48] [I] Verbose: Disabled
[10/14/2020-10:21:48] [I] Averages: 10 inferences
[10/14/2020-10:21:48] [I] Percentile: 99
[10/14/2020-10:21:48] [I] Dump output: Disabled
[10/14/2020-10:21:48] [I] Profile: Disabled
[10/14/2020-10:21:48] [I] Export timing to JSON file:
[10/14/2020-10:21:48] [I] Export output to JSON file:
[10/14/2020-10:21:48] [I] Export profile to JSON file:
[10/14/2020-10:21:48] [I]
Input filename: /tmp/test.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.6
Domain:
Model version: 0
Doc string:
[10/14/2020-10:23:31] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[10/14/2020-10:23:31] [I] Starting inference threads
[10/14/2020-10:23:34] [I] Warmup completed 0 queries over 200 ms
[10/14/2020-10:23:34] [I] Timing trace has 0 queries over 3.00063 s
[10/14/2020-10:23:34] [I] Trace averages of 10 runs:
[10/14/2020-10:23:34] [I] Average on 10 runs - GPU latency: 0.24994 ms - Host latency: 0.416225 ms (end to end 0.478523 ms, enqueue 0.0757492 ms)
…
0.3979 ms, enqueue 0.0746338 ms)
[10/14/2020-10:23:34] [I] Average on 10 runs - GPU latency: 0.209033 ms - Host latency: 0.37102 ms (end to end 0.403516 ms, enqueue 0.0744385 ms)
[10/14/2020-10:23:34] [I] Host Latency
[10/14/2020-10:23:34] [I] min: 0.357666 ms (end to end 0.368225 ms)
[10/14/2020-10:23:34] [I] max: 0.459991 ms (end to end 0.510849 ms)
[10/14/2020-10:23:34] [I] mean: 0.371668 ms (end to end 0.401351 ms)
[10/14/2020-10:23:34] [I] median: 0.369873 ms (end to end 0.401855 ms)
[10/14/2020-10:23:34] [I] percentile: 0.408752 ms at 99% (end to end 0.467224 ms at 99%)
[10/14/2020-10:23:34] [I] throughput: 0 qps
[10/14/2020-10:23:34] [I] walltime: 3.00063 s
[10/14/2020-10:23:34] [I] Enqueue Time
[10/14/2020-10:23:34] [I] min: 0.0690918 ms
[10/14/2020-10:23:34] [I] max: 0.124512 ms
[10/14/2020-10:23:34] [I] median: 0.0737915 ms
[10/14/2020-10:23:34] [I] GPU Compute
[10/14/2020-10:23:34] [I] min: 0.196289 ms
[10/14/2020-10:23:34] [I] max: 0.274429 ms
[10/14/2020-10:23:34] [I] mean: 0.208811 ms
[10/14/2020-10:23:34] [I] median: 0.207275 ms
[10/14/2020-10:23:34] [I] percentile: 0.246078 ms at 99%
[10/14/2020-10:23:34] [I] total compute time: 2.94236 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=/tmp/test.onnx --shapes=input:32x3x160x96 --explicitBatch --workspace=1024 --fp16 --saveEngine=/tmp/test.engine
ubuntu@ ~/tools/TensorRT-7.1.3.4/bin
{
int nbInput = network->getNbInputs();
auto inDim = network->getInput(0)->getDimensions();
int nbOutput = network->getNbOutputs();
auto outDim = network->getOutput(0)->getDimensions();
printf(“%s\n inputs %d, inputDims %d, inputCount %d\n outputs %d, outputDims %d, outputCount %d,
engine_size %d, engine_nbLayers %d\n”, output_file.c_str(),
nbInput, inDim.nbDims, samplesCommon::volume(inDim),
nbOutput, outDim.nbDims, samplesCommon::volume(outDim),
data->size(), engine->getNbLayers());
}
inputs 1, inputDims 4, inputCount 491520
outputs 1, outputDims 2, outputCount 96, engine_size 316916, engine_nbLayers 35
ubuntu @ ~/work
$ ./main image
ERROR: /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (460) - Cuda Error in loadKernel: -1 (TensorRT internal error)
ERROR: INVALID_STATE: std::exception
ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.