TensorRT on RTX 3080 slow down

r05943077 · March 2, 2022, 9:34am

We are using tensorRT on RTX3080 GPU to inference unet.

At the beginning, TensorRT unet takes only 1ms. However, after some iterations, the inference becomes slower. It reaches 3ms afterwards, which is slower.

The GPU temperature is 35C degree, not too hot. The GPU usage is not more than 30%, and the GPU memory is less than 1500MB. Could someone provide some insights regarding this issue? Thanks in advance.

We use:
Cudnn 8.2.3
TensorRT 8.4
Windows 10
CUDA 11.6
RTX 3080

the code is as below:

cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));
CHECK(cudaMemcpyAsync(buffers[inputIndex], input, input_size * sizeof(float), cudaMemcpyHostToDevice, stream));
enqueueV2(buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(output, buffers[outputIndex], output_size * sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);
cudaStreamDestroy(stream);

NVES · March 2, 2022, 10:07am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

r05943077 · March 9, 2022, 9:34am

We can use mobilenet to get the strange result.
The onnx is as follows:

github.com

onnx/models/blob/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx

version https://git-lfs.github.com/spec/v1
oid sha256:0e7c0aa4bc74650386fa1d2c84705753de7c2bdb21909ada5c59154bb429e092
size 13963115

We use trtexec to convert and run inference.
To convert:
trtexec --saveEngine=mobile.trt --onnx=mobilenetv2-7.onnx

To run inference:
trtexec --loadEngine=mobile.trt --dumpProfile --duration=0 --warmUp=0 --sleepTime=20 --idleTime=20 --verbose --iterations=N

When N=10, inference takes about 2.1 ms.
When N=100, inference takes about 2.1 ms.
When N=1000, inference takes about 3.6 ms.

As N become large, the model slow down. Really strange.

spolisetty · March 24, 2022, 8:17am

Hi,

We couldn’t get similar behavior. Could you please share with us verbose logs for all of the above.

Thank you.

780437510 · August 3, 2022, 4:10am

it maybe a tensorrt bug ，i try all model in those card like (3080 3080ti) ，infer will slower down ( all model i can get, tensorrt 7 ~ 8.4 )
infer speed will not slower down if the image in host memory in advance
but speed will slower down if image need load to host memory from disk
and cudnn infer will not slower down

code like :

       startTime = std::chrono::high_resolution_clock::now();

        context_->enqueue(input_shape_[0], buffers_, stream_, nullptr);
        // context_->enqueueV2(buffers_, stream_, nullptr);
        cudaStreamSynchronize(stream_);

    if(count ){ 
        auto endTime = std::chrono::high_resolution_clock::now();
        totol_time_ += std::chrono::duration<float, std::milli>(endTime - startTime).count();
        // totol_time_ += totalTime;
    }
    if(count % 10== 0){
        AIDI_LOG(info)<<"averge time---------------------" <<  time.average_time(Timer::Unit::MilliSecond);
        time.clear();
        // AIDI_LOG(info)<<"averge time---------------------" <<  totol_time_ / count; 
        }
    count++;

速度变化

[2022-08-02 12:18:22.244586] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.63985

[2022-08-02 12:18:22.729808] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.55214

[2022-08-02 12:18:23.159126] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.51793

[2022-08-02 12:18:23.623376] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.51454

[2022-08-02 12:18:24.032722] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.64723

[2022-08-02 12:18:24.511941] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------2.23529

[2022-08-02 12:18:24.980748] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------2.72391

[2022-08-02 12:18:25.494937] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------3.3516

[2022-08-02 12:18:25.962188] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------4.00212

[2022-08-02 12:18:26.529281] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------4.57732

[2022-08-02 12:18:27.690370] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------5.48994

[2022-08-02 12:18:28.284459] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------5.83345

r05943077 · September 16, 2022, 8:31am

Finally I found the solution.
The clock rate drops after running the program for 6 seconds, causing the inference to slow down. We can fix the issue by setting clock rate to some frequency. Use nvidia-smi to lock the freq.

I think this is some kind of strange bug.

780437510 · September 16, 2022, 8:42am

it maybe driver set temperature and power walls， you can set memory clock as max fixed frequency ( max fre ，max fre to fixed frequency) to solute power walls but temperature wall is another question.

and you need set a lot diver setting to have gpu maximum performance, like power mode . and so on.

it is weired for us to set nv driver setting .

i will set this parm by nvml interface in our code(aqrose)

Topic		Replies	Views
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference.How can i do that TensorRT tensorrt , cuda , jetson-nano	3	750	March 13, 2023
TensorRT inference take too much time than expected TensorRT tensorrt	2	1026	December 22, 2020
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference TensorRT tensorrt , jetson-inference , jetson-nano	1	906	March 13, 2023
ONNX runtime prediction using GPU and with different intervals TensorRT	4	1920	January 19, 2022
Strange CNN inference latency behavior with CUDA and TensorRT TensorRT cuda	13	1425	January 24, 2024
Why different input size causes different performance? TensorRT	4	772	October 12, 2021
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1508	October 20, 2022
Inference Speed Spikes When Running FP16 Converted ONNX Model with TensorRT TensorRT cudnn	1	29	January 31, 2025
TensorRT --- non-int8 fallback when trying to calibrate ONNX model DeepStream SDK tensorrt , deepstream	11	424	July 1, 2024
Tensorrt is slower than pytorch TensorRT	2	2208	September 15, 2021

TensorRT on RTX 3080 slow down

Related topics