How can I improve my prediction performance in TenserRt 3.0?

joon4141 · February 26, 2018, 9:21am

I use tensorRT 3.0 in Tesla v100 environment, and I want to convert caffe1 model to predict.
However, my tests showed that there is a lot of difference in performance from Nvidia. So I would like to ask you some questions.

Does the benchmark’s performance by batch size include all the time it takes to construct the input data and infer the results?
Based on 5000 images, Transfer input data to GPU and inference time are low, but memcpy_dtoh_async, which returns the results, took a long time. Is there a way to improve performance?
I added in batch size to the list when I entered the image.
If my method is wrong, how do I know how to infer the batch size differently in the tensorRT 3.0 environment?

thank you for reading.

moodie · March 22, 2018, 3:01pm

You should try to overlap compute and memory transfers as much as possible. The tensorrt nvinfer1::IExecutionContext::enqueue method should allow for you to hide the upload and download operations by streaming them in parallel to compute. Can you post example code?

joon4141 · April 11, 2018, 7:40am

Thank you for answer. I do not seem to speak my language. I’m working on a Python2 environment.

As you mentioned, I used ‘context.enqueue’ to run the inference and it took about 2.5 seconds for 5000 images.
After checking the time, ‘host to device time’ came out about 1.9sec and it took 0.001sec to get back.
If so, did the inference take about 0.5 seconds?

It takes more than 20 seconds to read the image and it takes about 3 seconds to preprocess. Using the TensorRT (V100-FP16) reduces the inference process by about 3 seconds compared to the GTX1080ti(FP32).
Compared to the total time, it is hard to feel the performance improvement.

Below is sample code written.
Please let me know if something is wrong.

SiddharthSharma_TPM · April 26, 2018, 9:37pm

We created a new “Deep Learning Training and Inference” section in Devtalk to improve the experience for deep learning and accelerated computing, and HPC users:
https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/

We are moving active deep learning threads to the new section.

URLs for topics will not change with the re-categorization. So your bookmarks and links will continue to work as earlier.

-Siddharth