I have a inference engine written with Tensorrt 4.0.1.6. When given one file as input it will give one output file. It support multi-thread and when given multi input files it will create one thread per file. In order to check the correctness, I make 10 copies of one input file and try to inference them. The problem is sometimes it give 10 same output with same md5, and sometimes it give 10 outputs some of which has different md5.
When I dump the different md5 output to text, I find they seems just have different precision
I have check my code’s thread safety for several times and find nothing suspicious. I use independent ExecutionContext and binding buffers for each thread according to the reference. So I expect some help of the direction. Does it seem like my code bugs or it caused by some internal logic that I don’t know. Thanks!
Environment
TensorRT Version: Tensorrt 4.0.1.6 GPU Type: GeForce GTX TITAN X Nvidia Driver Version: 430.50 CUDA Version: 10.1 CUDNN Version: 7.0.5 Operating System + Version: CentOS Linux release 7.4.1708 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Does it mean the same input may get the slightly different precision output? Since I used to compare md5 of the intermediate result to locate bugs. So does it mean I should not never do that again? Thanks.
I have upgrade my code to TensorRT-7.0.0.11. The issue seems slightly different. I have run a test of 20 worker threads each time for 100 round. Among the 2000 result files, I get 1937 files with the same md5(fa6d87b1f5b220a30f7647654a60f6c0) and 63 for another md5(9d661f06c6a3c2f6c117c487227cbb9e).
I guess that should be the reason I get wrong result. So now the situation is:
1. the slightly precision difference among parallel disappear. All the valid result is the same.
2. there is some parallel issue lead to runtime error and invalid result that I still need to resolve.
After I search the runtime error, I am lead to this post which seems similar to my case:
So that’s latest progress on my issue.
Thanks.
runtime env:
CentOS 7.5.1804
GPU: TITAN V
Driver Version: 410.48
CUDA version: 10.0
TensorRT-7.0.0.11
Cudnn: 7.6.5
It is company product code written in c++ and I’m sorry I can’t share it since it is for commercial. I have switch the TRT log to VERBOSE. Hope that can given you some clue.
Thanks.log.log (2.0 MB)
I have tried 6.0.1.5 and it seems a good workaround for me. There is no runtime error or crash. Though the total 2000 results fall to 2 distinct md5, they are the same in each round. It seems like some initial value issue. And it is much better than trt4 which has different md5 in a single round. Anyway I have checked the both results are valid with only precision difference. We will use trt6.0.1.5 before you fix the issue in new version. Thanks for your help.