TensorRT 2.1.2 fails to run int8 model on P40

Unfortunately, I can’t run the successfully converted int8 model on P40.

The model includes a plugin layer, but the same procedure works fine on 1080 and 1080Ti.

Here is the output of gdb trace:

trt-infer-test: customWinogradConvActLayer.cpp:195: virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion 'configIsValid(context)' failed.

Thread 4 "trt-infer-test" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffc0ae2700 (LWP 3338)]
0x00007fffea52d428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007fffea52d428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fffea52f02a in __GI_abort () at abort.c:89
#2  0x00007fffea525bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7fffec058630 "configIsValid(context)", 
    file=file@entry=0x7fffec066d40 "customWinogradConvActLayer.cpp", line=line@entry=195, 
    function=function@entry=0x7fffec066f00 <nvinfer1::cudnn::WinogradConvActLayer::allocateResources(nvinfer1::cudnn::CommonContext const&)::__PRETTY_FUNCTION__> "virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&)") at assert.c:92
#3  0x00007fffea525c82 in __GI___assert_fail (assertion=0x7fffec058630 "configIsValid(context)", file=0x7fffec066d40 "customWinogradConvActLayer.cpp", line=195, 
    function=0x7fffec066f00 <nvinfer1::cudnn::WinogradConvActLayer::allocateResources(nvinfer1::cudnn::CommonContext const&)::__PRETTY_FUNCTION__> "virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&)") at assert.c:101
#4  0x00007fffebf9192f in nvinfer1::cudnn::WinogradConvActLayer::allocateResources(nvinfer1::cudnn::CommonContext const&) () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.3
#5  0x00007fffebf3d9f3 in nvinfer1::cudnn::Engine::initialize() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.3
#6  0x00007fffebf45fad in nvinfer1::cudnn::Engine::deserialize(void const*, unsigned long, nvinfer1::IPluginFactory*) () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.3
#7  0x00007fffebf97ffe in nvinfer1::Runtime::deserializeCudaEngine(void const*, unsigned long, nvinfer1::IPluginFactory*) () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.3
<deleted>

Hello,

Can you please share the int8 results on 1080 TI ?

My results on 1080 ti :

INT8 run:400 batches of size 100 starting at 100

Top1: 0.9909, Top5: 1
Processing 40000 images averaged 0.00140045 ms/image and 0.140045 ms/batch.

FP32 run:400 batches of size 100 starting at 100

Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.00242989 ms/image and 0.242989 ms/batch.