TensorRT on RTX 3080 slow down

We are using tensorRT on RTX3080 GPU to inference unet.

At the beginning, TensorRT unet takes only 1ms. However, after some iterations, the inference becomes slower. It reaches 3ms afterwards, which is slower.

The GPU temperature is 35C degree, not too hot. The GPU usage is not more than 30%, and the GPU memory is less than 1500MB. Could someone provide some insights regarding this issue? Thanks in advance.

We use:
Cudnn 8.2.3
TensorRT 8.4
Windows 10
CUDA 11.6
RTX 3080

the code is as below:

cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));
CHECK(cudaMemcpyAsync(buffers[inputIndex], input, input_size * sizeof(float), cudaMemcpyHostToDevice, stream));
enqueueV2(buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(output, buffers[outputIndex], output_size * sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);
cudaStreamDestroy(stream);

1 Like

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

We can use mobilenet to get the strange result.
The onnx is as follows:

We use trtexec to convert and run inference.
To convert:
trtexec --saveEngine=mobile.trt --onnx=mobilenetv2-7.onnx

To run inference:
trtexec --loadEngine=mobile.trt --dumpProfile --duration=0 --warmUp=0 --sleepTime=20 --idleTime=20 --verbose --iterations=N

When N=10, inference takes about 2.1 ms.
When N=100, inference takes about 2.1 ms.
When N=1000, inference takes about 3.6 ms.

As N become large, the model slow down. Really strange.

Hi,

We couldn’t get similar behavior. Could you please share with us verbose logs for all of the above.

Thank you.

it maybe a tensorrt bug ,i try all model in those card like (3080 3080ti) ,infer will slower down ( all model i can get, tensorrt 7 ~ 8.4 )
infer speed will not slower down if the image in host memory in advance
but speed will slower down if image need load to host memory from disk
and cudnn infer will not slower down

code like :

       startTime = std::chrono::high_resolution_clock::now();

        context_->enqueue(input_shape_[0], buffers_, stream_, nullptr);
        // context_->enqueueV2(buffers_, stream_, nullptr);
        cudaStreamSynchronize(stream_);

    if(count ){ 
        auto endTime = std::chrono::high_resolution_clock::now();
        totol_time_ += std::chrono::duration<float, std::milli>(endTime - startTime).count();
        // totol_time_ += totalTime;
    }
    if(count % 10== 0){
        AIDI_LOG(info)<<"averge time---------------------" <<  time.average_time(Timer::Unit::MilliSecond);
        time.clear();
        // AIDI_LOG(info)<<"averge time---------------------" <<  totol_time_ / count; 
        }
    count++;

速度变化

[2022-08-02 12:18:22.244586] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.63985

[2022-08-02 12:18:22.729808] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.55214

[2022-08-02 12:18:23.159126] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.51793

[2022-08-02 12:18:23.623376] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.51454

[2022-08-02 12:18:24.032722] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------1.64723

[2022-08-02 12:18:24.511941] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------2.23529

[2022-08-02 12:18:24.980748] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------2.72391

[2022-08-02 12:18:25.494937] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------3.3516

[2022-08-02 12:18:25.962188] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------4.00212

[2022-08-02 12:18:26.529281] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------4.57732

[2022-08-02 12:18:27.690370] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------5.48994

[2022-08-02 12:18:28.284459] [0x0000a5a0] [info] [location_client.cpp(116)]: averge time---------------------5.83345

1 Like

Finally I found the solution.
The clock rate drops after running the program for 6 seconds, causing the inference to slow down. We can fix the issue by setting clock rate to some frequency. Use nvidia-smi to lock the freq.

I think this is some kind of strange bug.

it maybe driver set temperature and power walls, you can set memory clock as max fixed frequency ( max fre ,max fre to fixed frequency) to solute power walls but temperature wall is another question.

and you need set a lot diver setting to have gpu maximum performance, like power mode . and so on.

it is weired for us to set nv driver setting .

i will set this parm by nvml interface in our code(aqrose)

1 Like