I found that using tensorrt for inference takes more time than using tensorflow directly on GPU

724215288 · April 9, 2019, 3:26am

hi,

I found that using tensorrt for inference takes more time than using tensorflow directly on GPU for big model.

when i use tensorrt to infer a 720 video and it takes 600ms per frames. The Memory-Usage is 5185MiB/24478Mib.

when i use tensorflow directly on GPU to infer a 720 video and it takes 236ms per frames. The Memory-Usage is 23370MiB/24478Mib.

it sinces tensorrt did not make full use of memory?

#include "tveModelSR.h"

namespace TVE {

tveModelSR::tveModelSR(const std::string& tve_name, nvinfer1::DataType type,
		int n, int c, int h, int w, int s):tveModelBase(tve_name, type, n, c, h, w), scale(s) {
	// TODO Auto-generated constructor stub
	use_times=0;
	n_frames=0;
}

tveModelSR::~tveModelSR() {
	// TODO Auto-generated destructor stub
	printf("sr_model: times:%.2fms\n", use_times/n_frames);
}


int tveModelSR::execute(int batch_n, void* input_cuda_buffer, void* output_cuda_buffer, cudaStream_t stream) {
	if(!is_init) {
		log_ctx.append("ctx is not inited,");
		return -1;
	}
	
	// set buffer
	void* buffers[2];
	int y_input_index=engine->getBindingIndex(info.input_name.c_str());
	int y_output_index=engine->getBindingIndex(info.output_name.c_str());
	buffers[y_input_index]=input_cuda_buffer; 
	buffers[y_output_index]=output_cuda_buffer;
	
	// start Infer
	auto time_start = std::chrono::high_resolution_clock::now();
	if(!context->enqueue(batch_n, &buffers[0], stream, nullptr)) {
		log_ctx.append("execute context fail,");
		return -5;
	}
	cudaStreamSynchronize(stream);
	
	// sum times
	use_times += std::chrono::duration<float, std::milli>(std::chrono::high_resolution_clock::now() - time_start).count();
	++n_frames;

	return 0;
}

} /* namespace TVE */

The context is inited outside and the buffer, the stream is public variable outside.

I don’t know how to solve this problem.

PS: This model is a super resolution model and has 20 conv layers.

724215288 · April 9, 2019, 9:16am

Maybe the problem is caused by the op tf.layers.conv2d_transpose.

I try to modyfy the output channels of tf.layers.conv2d_transpose and the infer’s times is gone down.
why is that?
Is TRT not suited to the op conv2d_transpose?

Topic		Replies	Views
inference time of tensorrt is slower than tensorflow !!! TensorRT	2	1433	September 27, 2019
Why is TensorRT faster than TensorFlow? TensorRT	3	1605	April 26, 2022
Tensorrt inference slower than tensorflow TensorRT	3	483	November 27, 2020
TensorRT inference Time TensorRT	1	756	September 20, 2018
TensorRT frames processing speed increases with increase in number of frames Jetson Nano tensorrt	4	804	October 15, 2021
Inference on large batch size TensorRT	5	4587	September 21, 2018
The first inference using tensorRT model takes far longer time than that using tensorflow model TensorRT	0	658	November 13, 2020
TensorRT on RTX 3080 slow down TensorRT tensorrt	6	2016	September 16, 2022
Inference Time When Using Multi Stream Multi Context in TensorRT is Slower than a Single One TensorRT tensorrt , cuda , cudnn	1	29	November 30, 2024
TensorRT 5 vs Keras; poor performance? TensorRT	10	3722	December 11, 2018

I found that using tensorrt for inference takes more time than using tensorflow directly on GPU

Related topics