Envs
• Hardware Platform (Jetson / GPU) GeForce GTX 1070
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only)
• TensorRT Version 7.0.0.11
• NVIDIA GPU Driver Version (valid for GPU only) 440.33.01
Problem Description
I refer to the tensorrtx to generate yolov4 engine file. It runs well when test tensorrtx yolov4.
Then I add engine file to deepstream 5.0 refer to deepstream-app. I have changed config files and rewrite nvdsparsebbox_Yolo.cpp and nvdsinfer_yolo_engine.cpp etc. I can get infer results via std::vector<NvDsInferLayerInfo> const &outputLayersInfo
. but the result is different from tensorrtx and seems wrong .
Errors Print
I print the results (before nms) get from deepstream as follow:
...
x: 72 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 80 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 88 y: 272 w: inf h: inf det Confidence: 1 id: 3 class Confidence: 1
x: 96 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 136 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 144 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 152 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 160 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 200 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 208 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 216 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 224 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 264 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 272 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 280 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 288 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 104 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 112 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 120 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 128 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 168 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
x: 176 y: 272 w: inf h: inf det Confidence: 1 id: 1 class Confidence: 1
...
Implements
I doubt that infer pipeline in deepstream gets wrong results. But tensorrtx infer results are right , and the doInference function :
void doInference(IExecutionContext &context, float *input, float *output, int batchSize)
{
const ICudaEngine &engine = context.getEngine();
// Pointers to input and output device buffers to pass to engine.
// Engine requires exactly IEngine::getNbBindings() number of buffers.
assert(engine.getNbBindings() == 2);
void *buffers[2];
// In order to bind the buffers, we need to know the names of the input and output tensors.
// Note that indices are guaranteed to be less than IEngine::getNbBindings()
const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);
const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);
// Create GPU buffers on device
CHECK(cudaMalloc(&buffers[inputIndex], batchSize * 3 * INPUT_H * INPUT_W * sizeof(float)));
CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));
// Create stream
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));
// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
context.enqueue(batchSize, buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);
// Release stream and buffers
cudaStreamDestroy(stream);
CHECK(cudaFree(buffers[inputIndex]));
CHECK(cudaFree(buffers[outputIndex]));
}
Additional
I also refer Iplugin tensorrt engine error for ds5.0 #5, but I can’t get engine file when run $ sudo /usr/local/TensorRT-7.0.0.11/bin/trtexec --onnx=yolov4_4_3_608_608.onnx --workspace=4096 --saveEngine=yolov4.engine --fp16 --explicitBatch
. Errors :
(yolov4) dreamdeck@mjj:~/Documents/code/test/yolov4/pytorch-YOLOv4$ sudo /usr/local/TensorRT-7.0.0.11/bin/trtexec --onnx=yolov4_4_3_608_608.onnx --workspace=4096 --saveEngine=yolov4.engine --fp16 --explicitBatch
&&&& RUNNING TensorRT.trtexec # /usr/local/TensorRT-7.0.0.11/bin/trtexec --onnx=yolov4_4_3_608_608.onnx --workspace=4096 --saveEngine=yolov4.engine --fp16 --explicitBatch
[05/30/2020-18:16:23] [I] === Model Options ===
[05/30/2020-18:16:23] [I] Format: ONNX
[05/30/2020-18:16:23] [I] Model: yolov4_4_3_608_608.onnx
[05/30/2020-18:16:23] [I] Output:
[05/30/2020-18:16:23] [I] === Build Options ===
[05/30/2020-18:16:23] [I] Max batch: explicit
[05/30/2020-18:16:23] [I] Workspace: 4096 MB
[05/30/2020-18:16:23] [I] minTiming: 1
[05/30/2020-18:16:23] [I] avgTiming: 8
[05/30/2020-18:16:23] [I] Precision: FP16
[05/30/2020-18:16:23] [I] Calibration:
[05/30/2020-18:16:23] [I] Safe mode: Disabled
[05/30/2020-18:16:23] [I] Save engine: yolov4.engine
[05/30/2020-18:16:23] [I] Load engine:
[05/30/2020-18:16:23] [I] Inputs format: fp32:CHW
[05/30/2020-18:16:23] [I] Outputs format: fp32:CHW
[05/30/2020-18:16:23] [I] Input build shapes: model
[05/30/2020-18:16:23] [I] === System Options ===
[05/30/2020-18:16:23] [I] Device: 0
[05/30/2020-18:16:23] [I] DLACore:
[05/30/2020-18:16:23] [I] Plugins:
[05/30/2020-18:16:23] [I] === Inference Options ===
[05/30/2020-18:16:23] [I] Batch: Explicit
[05/30/2020-18:16:23] [I] Iterations: 10
[05/30/2020-18:16:23] [I] Duration: 3s (+ 200ms warm up)
[05/30/2020-18:16:23] [I] Sleep time: 0ms
[05/30/2020-18:16:23] [I] Streams: 1
[05/30/2020-18:16:23] [I] ExposeDMA: Disabled
[05/30/2020-18:16:23] [I] Spin-wait: Disabled
[05/30/2020-18:16:23] [I] Multithreading: Disabled
[05/30/2020-18:16:23] [I] CUDA Graph: Disabled
[05/30/2020-18:16:23] [I] Skip inference: Disabled
[05/30/2020-18:16:23] [I] Inputs:
[05/30/2020-18:16:23] [I] === Reporting Options ===
[05/30/2020-18:16:23] [I] Verbose: Disabled
[05/30/2020-18:16:23] [I] Averages: 10 inferences
[05/30/2020-18:16:23] [I] Percentile: 99
[05/30/2020-18:16:23] [I] Dump output: Disabled
[05/30/2020-18:16:23] [I] Profile: Disabled
[05/30/2020-18:16:23] [I] Export timing to JSON file:
[05/30/2020-18:16:23] [I] Export output to JSON file:
[05/30/2020-18:16:23] [I] Export profile to JSON file:
[05/30/2020-18:16:23] [I]
----------------------------------------------------------------
Input filename: yolov4_4_3_608_608.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.5
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/30/2020-18:16:24] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[05/30/2020-18:16:24] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[05/30/2020-18:16:24] [E] [TRT] Layer: (Unnamed Layer* 426)[Select]'s output can not be used as shape tensor.
[05/30/2020-18:16:24] [E] [TRT] Network validation failed.
[05/30/2020-18:16:24] [E] Engine creation failed
[05/30/2020-18:16:24] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/local/TensorRT-7.0.0.11/bin/trtexec --onnx=yolov4_4_3_608_608.onnx --workspace=4096 --saveEngine=yolov4.engine --fp16 --explicitBatch
I have no ideas for how to solve it.
Thanks.