We are using tensorRT on RTX3080 GPU to inference unet.
At the beginning, TensorRT unet takes only 1ms. However, after some iterations, the inference becomes slower. It reaches 3ms afterwards, which is slower.
The GPU temperature is 35C degree, not too hot. The GPU usage is not more than 30%, and the GPU memory is less than 1500MB. Could someone provide some insights regarding this issue? Thanks in advance.
We use:
Cudnn 8.2.3
TensorRT 8.4
Windows 10
CUDA 11.6
RTX 3080
the code is as below:
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));
CHECK(cudaMemcpyAsync(buffers[inputIndex], input, input_size * sizeof(float), cudaMemcpyHostToDevice, stream));
enqueueV2(buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(output, buffers[outputIndex], output_size * sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);
cudaStreamDestroy(stream);
NVES
March 2, 2022, 10:07am
#2
Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy
Thanks!
We can use mobilenet to get the strange result.
The onnx is as follows:
We use trtexec to convert and run inference.
To convert:
trtexec --saveEngine=mobile.trt --onnx=mobilenetv2-7.onnx
To run inference:
trtexec --loadEngine=mobile.trt --dumpProfile --duration=0 --warmUp=0 --sleepTime=20 --idleTime=20 --verbose --iterations=N
When N=10, inference takes about 2.1 ms.
When N=100, inference takes about 2.1 ms.
When N=1000, inference takes about 3.6 ms.
As N become large, the model slow down. Really strange.
Hi,
We couldn’t get similar behavior. Could you please share with us verbose logs for all of the above.
Thank you.