Hi.
I’ve tested TensorRT Softmax operation which converted from ONNX model.
I made a single layer Softmax for (3, 4, 5) input/output shape with the following code.
However, it seems that TensorRT launched by trtexec converts the output as (1, 4, 5) shape. I suppose that the output shape of Softmax is same as the input shape.
Would you please tell me what’s wrong on my test?
import onnx
import onnx.helper as oh
from onnx import checker
out_path = "softmax_test.onnx"
def main():
in_tensor = [
oh.make_tensor_value_info("Input", onnx.TensorProto.FLOAT, [3, 4, 5]),
]
out_tensor = [
oh.make_tensor_value_info("Output", onnx.TensorProto.FLOAT, [3, 4, 5]),
]
nodes = []
nodes.append(oh.make_node("Softmax", axis=1, inputs=["Input"], outputs=["Output"]))
graph = oh.make_graph(nodes, "Test Graph", in_tensor, out_tensor)
checker.check_graph(graph)
model = oh.make_model(graph, producer_name="TFURU2", producer_version="0.1")
checker.check_model(model)
with open(out_path, "wb") as f:
f.write(model.SerializeToString())
with open(out_path + ".txt", "w") as f:
print(model, file=f)
if __name__ == "__main__":
main()
Here is the trtexec output.
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=softmax_test.onnx --verbose --dumpOutput --batch=1 --safe
[09/01/2019-08:44:04] [I] === Model Options ===
[09/01/2019-08:44:04] [I] Format: ONNX
[09/01/2019-08:44:04] [I] Model: softmax_test.onnx
[09/01/2019-08:44:04] [I] Output:
[09/01/2019-08:44:04] [I] === Build Options ===
[09/01/2019-08:44:04] [I] Max batch: 1
[09/01/2019-08:44:04] [I] Workspace: 16 MB
[09/01/2019-08:44:04] [I] minTiming: 1
[09/01/2019-08:44:04] [I] avgTiming: 8
[09/01/2019-08:44:04] [I] Precision: FP32
[09/01/2019-08:44:04] [I] Calibration:
[09/01/2019-08:44:04] [I] Safe mode: Enabled
[09/01/2019-08:44:04] [I] Save engine:
[09/01/2019-08:44:04] [I] Load engine:
[09/01/2019-08:44:04] [I] Inputs format: fp32:CHW
[09/01/2019-08:44:04] [I] Outputs format: fp32:CHW
[09/01/2019-08:44:04] [I] Input build shapes: model
[09/01/2019-08:44:04] [I] === System Options ===
[09/01/2019-08:44:04] [I] Device: 0
[09/01/2019-08:44:04] [I] DLACore:
[09/01/2019-08:44:04] [I] Plugins:
[09/01/2019-08:44:04] [I] === Inference Options ===
[09/01/2019-08:44:04] [I] Batch: 1
[09/01/2019-08:44:04] [I] Iterations: 10 (200 ms warm up)
[09/01/2019-08:44:04] [I] Duration: 10s
[09/01/2019-08:44:04] [I] Sleep time: 0ms
[09/01/2019-08:44:04] [I] Streams: 1
[09/01/2019-08:44:04] [I] Spin-wait: Disabled
[09/01/2019-08:44:04] [I] Multithreading: Enabled
[09/01/2019-08:44:04] [I] CUDA Graph: Disabled
[09/01/2019-08:44:04] [I] Skip inference: Disabled
[09/01/2019-08:44:04] [I] Input inference shapes: model
[09/01/2019-08:44:04] [I] === Reporting Options ===
[09/01/2019-08:44:04] [I] Verbose: Enabled
[09/01/2019-08:44:04] [I] Averages: 10 inferences
[09/01/2019-08:44:04] [I] Percentile: 99
[09/01/2019-08:44:04] [I] Dump output: Enabled
[09/01/2019-08:44:04] [I] Profile: Disabled
[09/01/2019-08:44:04] [I] Export timing to JSON file:
[09/01/2019-08:44:04] [I] Export profile to JSON file:
[09/01/2019-08:44:04] [I]
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - NMS_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - Reorg_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - Region_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - Clip_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - LReLU_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - PriorBox_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - Normalize_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - RPROI_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[09/01/2019-08:44:04] [V] [TRT] Plugin Creator registration succeeded - FlattenConcat_TRT
----------------------------------------------------------------
Input filename: softmax_test.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: MACNICA
Producer version: 0.1
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[09/01/2019-08:44:04] [V] [TRT] Output:Softmax -> (4, 5)
----- Parsing of ONNX model softmax_test.onnx is Done ----
[09/01/2019-08:44:04] [V] [TRT] Applying generic optimizations to the graph for inference.
[09/01/2019-08:44:04] [V] [TRT] Original: 1 layers
[09/01/2019-08:44:04] [V] [TRT] After dead-layer removal: 1 layers
[09/01/2019-08:44:04] [V] [TRT] After scale fusion: 1 layers
[09/01/2019-08:44:04] [V] [TRT] After vertical fusions: 1 layers
[09/01/2019-08:44:04] [V] [TRT] After final dead-layer removal: 1 layers
[09/01/2019-08:44:04] [V] [TRT] After tensor merging: 1 layers
[09/01/2019-08:44:04] [V] [TRT] After concat removal: 1 layers
[09/01/2019-08:44:04] [V] [TRT] Graph construction and optimization completed in 0.000163059 seconds.
[09/01/2019-08:44:06] [V] [TRT] Constructing optimization profile number 0 out of 1
*************** Autotuning format combination: Float(1,5,20) -> Float(1,5,20) ***************
[09/01/2019-08:44:06] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 0) [Softmax] (SoftMax)
[09/01/2019-08:44:06] [V] [TRT] Tactic: 1001 time 0.007168
[09/01/2019-08:44:06] [V] [TRT] Fastest Tactic: 1001 Time: 0.007168
[09/01/2019-08:44:06] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: SoftMax Tactic: 1001
[09/01/2019-08:44:06] [V] [TRT]
[09/01/2019-08:44:06] [V] [TRT] Formats and tactics selection completed in 0.00245976 seconds.
[09/01/2019-08:44:06] [V] [TRT] After reformat layers: 1 layers
[09/01/2019-08:44:06] [V] [TRT] Block size 16777216
[09/01/2019-08:44:06] [V] [TRT] Total Activation Memory: 16777216
[09/01/2019-08:44:06] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[09/01/2019-08:44:06] [V] [TRT] Engine generation completed in 1.42676 seconds.
[09/01/2019-08:44:06] [V] [TRT] Engine Layer Information:
[09/01/2019-08:44:06] [V] [TRT] Layer: (Unnamed Layer* 0) [Softmax] (SoftMax), Tactic: 1001, Input[Float(4,5)] -> Output[Float(4,5)]
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0111616 ms (host walltime is 0.0464016 ms, 99% percentile time is 0.024576).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0091136 ms (host walltime is 0.0346638 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0093216 ms (host walltime is 0.0341824 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0095232 ms (host walltime is 0.0343413 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0091104 ms (host walltime is 0.0345789 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0091136 ms (host walltime is 0.0344361 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.009312 ms (host walltime is 0.0344415 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0091136 ms (host walltime is 0.0343255 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0090144 ms (host walltime is 0.0343258 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Average over 10 runs is 0.0090144 ms (host walltime is 0.0344853 ms, 99% percentile time is 0.011264).
[09/01/2019-08:44:06] [I] Dumping output tensor Output:
[09/01/2019-08:44:06] [I] [1, 4, 5]
[09/01/2019-08:44:06] [I] 0.25 0.25 0.25 0.25 0.25
[09/01/2019-08:44:06] [I] 0.25 0.25 0.25 0.25 0.25
[09/01/2019-08:44:06] [I] 0.25 0.25 0.25 0.25 0.25
[09/01/2019-08:44:06] [I] 0.25 0.25 0.25 0.25 0.25
&&&& PASSED TensorRT.trtexec # trtexec --onnx=softmax_test.onnx --verbose --dumpOutput --batch=1 --safe
Thanks.