hardware:
v100
cuda: 11.7
tensorrt: 8.4
ubuntu
i did some test
used yolo onnx to build fp16, and in8 trt. and use nvdsparse_yolo.cpp and nvdsparse_yolo.cu code to infer.
i got some result, but i did not understand this result. could you give some idea or explan for me . thank you so much
result like this:
test 1: fp16 trt and nvdsparse_yolo.cpp; usage: CPU:43%,GPU:11%,SM:34%
test 2: fp16 trt and nvdsparse_yolo.cu; usage: CPU:46%,GPU:11%,SM:34%
test 3: int8 trt and nvdsparse_yolo.cpp; usage: CPU:19%,GPU:68%,SM:77%
test 4: int8 trt and nvdsparse_yolo.cu; usage: CPU:20%,GPU:63%,SM:66%
I did not understand why int8 cost so much GPU , SM usage, and use less CPU usage, and why .cpp and .cu code cost the similar usage .