Description
I refered wang-xinyu/tensorrtx to create TensorRT network. And I tried to implement weizheliu/People-Flows using TensorRT API. After fine-tuning some operator, the program is running fine, but result is partially weird.
I used two (1, 3, 540, 960) images to testing under Pytorch and TensorRT respectively. Two runtime have the same output shape (1, 10, 67, 120). Then I tried to separate the output by channel to check counting result via sum function.
- In Pytorch, results are
[2.58848, 12.08908, 13.08177, 11.02262, 0.03745,
0.03239, 0.02941, 0.09200, 0.02342, 7.01662] - In TensorRT, results are
[2.31062, 13.39710, 12.2439, 8.91591, 0.01363,
0.03388, 0.02145, 0.04163, 2.37593, -nan]
Last number in TensorRT would change every inference, it might very big, very small, or nan. It does not seem to be a problem with the definition of network using TensorRT API. Is this a problem when binding the result pointer to the vector of matrix?
In TensorRT, I combined openCV mat to store result per channel. Like this:
void* result_blob = malloc(outputSize);
cudaMemcpyAsync(result_blob, buffers[2], outputSize, cudaMemcpyDeviceToHost, stream);
...
std::vector<cv::Mat> chw_output;
for (int i = 0; i < 10; i++)
{
chw_output.emplace_back(cv::Mat(image.rows / 8, image.cols / 8, CV_32FC1,
(float*)result_blob + i * image.size().height/8 * image.size().width/8));
}
I assign the memory space of the result to each cv::mat by result pointer and single channel size.
What should I use to get the right cv::Mat?
Environment
TensorRT Version: 7.2.3.4
GPU Type: RTX 2070 8G
Nvidia Driver Version: 510.06
CUDA Version: 11.1
CUDNN Version: 8.1
Operating System + Version: Windows11 22000.282
Python Version (if applicable): 3.9.4
PyTorch Version (if applicable): 1.8.1+cu111
Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
Steps To Reproduce
Build with VS2019 x64 Release, No error, Return 0