Is there efficient way to move cv::Mat to tensorrt 's buffer

I have an image data, which is first converted to c++ array of float type, and then send to GPU buffer,

cv::Mat image = cv::imread("./1.jpg");
float src_data_img[256 * 512 * 3 ];
for (int i = 0; i < img.rows; i++) {
	for (int j = 0; j < img.cols; j++) {
		src_data_img.push_back(<cv::Vec3f>(i, j)[0]);
		src_data_img.push_back(<cv::Vec3f>(i, j)[1]);
		src_data_img.push_back(<cv::Vec3f>(i, j)[2]);

void* buffers[1];
cudaMemcpy(buffers[0], data, 256 * 512 * 3 * sizeof(float), 

We can see here, two large buffers( cv::Mat image and data_img[256 * 512 * 3 ]) are required to sent image data to GPU , is there any efficient way to save one buffer? 2 things are done here,
1, convert image’s uint8 data to float required by tenssort
2, send the converted data to cuda buffer

meanwhile, after inference ,data format are
float output[2565124]

how to convert output to cv::Mat format quickly for post processing ?


Hope the following doc may help you.

Thank you.