Same input different output with Tensorrt4.0.1.6

hi,

I have upgrade my code to TensorRT-7.0.0.11. The issue seems slightly different. I have run a test of 20 worker threads each time for 100 round. Among the 2000 result files, I get 1937 files with the same md5(fa6d87b1f5b220a30f7647654a60f6c0) and 63 for another md5(9d661f06c6a3c2f6c117c487227cbb9e).

I notice the different md5 seems happend together with the following runtime error:

[TRT] FAILED_EXECUTION: std::exception
FAILED_EXECUTION: std::exception
FAILED_EXECUTION: std::exception
FAILED_EXECUTION: std::exception
[05/21/2020-15:04:09] [E] [TRT] FAILED_EXECUTION: std::exception
[05/21/2020-15:04:09] [F] [TRT] Assertion failed: *refCount > 0
../rtSafe/WeightsPtr.cpp:20
Aborting...

I guess that should be the reason I get wrong result. So now the situation is:

1. the slightly precision difference among parallel disappear. All the valid result is the same.
2. there is some parallel issue lead to runtime error and invalid result that I still need to resolve.

After I search the runtime error, I am lead to this post which seems similar to my case:

So that’s latest progress on my issue.

Thanks.

runtime env:

CentOS 7.5.1804
GPU: TITAN V
Driver Version: 410.48
CUDA version: 10.0
TensorRT-7.0.0.11
Cudnn: 7.6.5