TRT failed to generate plan on Turing (T4&RTX2080)

Hi guys,

We can’t generate TRT plan out of TF pb.
Error log:

python: ../builder/cudnnConvolutionTraits.cpp:50: static void nvinfer1::cudnn::CudnnConvolutionTraits::createConstants(nvinfer1::rt::cuda::CudnnConvolutionLayer&, nvinfer1::builder::GlobWriter&, const nvinfer1::rt::CommonContext&): Assertion `layer.configIsValid(context)' failed.
Aborted

versions:
Cuda: 10
TF 1.7.1
TRT 5.0.2.6
CUDNN 7.3.1.20
driver: 410.48

It fails on T4 and RTX2080, but works on 1080Ti. With all the same versions.

Also it works on V100

We use the following code to generate plan:

import numpy as np
import time
import cv2
import uff
import tensorrt as trt
import tensorrt.legacy
import os
import uuid

def getArch():
    adapters = os.popen("lspci | grep NVIDIA")
    output = adapters.read()
    adapters.close()
    out = output
    out = out.split(" ")
    index = out.index("Corporation")
    chip = out[index + 1]
    return chip

mem = 11 << 30

for cropsizeX in [1920 / 8]:
    for cropsizeY in [(1080 + 8)/ 8]:
        for nntype in ["col"]:
            if nntype == 'col': 
                tfname = "col.pb"
                nodes = {'pan': (1, cropsizeX, cropsizeY), 'color': (3, cropsizeX, cropsizeY)}
                outnodes = ['out']
                outname = "./col_"+str(mem)+"_"+str(cropsizeX)+"_"+str(cropsizeY)+"_"+getArch()+".engine"
            elif nntype == "sr":
                tfname = "sr.pb"
                nodes = {'Placeholder': (1, cropsizeX, cropsizeY)}
                outnodes = ['out']
                outname = "./sr_"+str(mem)+"_"+str(cropsizeX)+"_"+str(cropsizeY)+"_"+getArch()+".engine"

if True:
                engine = trt.legacy.lite.Engine(framework = "tf",  # Source framework
                                path = tfname, 
                                max_batch_size = 32, 
                                max_workspace_size = mem, 
                                input_nodes = nodes, 
                                output_nodes = outnodes,
                                data_type = trt.legacy.infer.DataType.HALF,
                                logger_severity = trt.legacy.infer.LogSeverity.WARNING,
                                device = 0)
                engine.save(outname)
                del engine
            else:
                logger = trt.legacy.infer.Logger()
                engine =  trt.legacy.lite.Engine(PLAN=outname, 
                                   data_type = trt.legacy.infer.DataType.HALF)

Hi dbatischev,

I had a quick try on T4 and observed the same issue. And we will investigate what’s happening.
BTW, I was able to run the model in FP32 mode, so can it be a temp option for you to get unblocked?

Thanks for reply.
I’ll try as soon as I managed to launch T4 instance at GCP.
Now it fails with:

[    4.866977] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    4.875308] IP: _nv007834rm+0x14/0xd0 [nvidia]
[    4.879872] PGD 800000042bf0f067 P4D 800000042bf0f067 PUD 42bf0e067 PMD 0 
[    4.879875] Oops: 0000 [#1] SMP PTI
[    4.879877] Modules linked in: nvidia_uvm(POE) input_leds pvpanic serio_raw sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops aesni_intel drm aes_x86_64 crypto_simd glue_helper psmouse cryptd ipmi_devintf ipmi_msghandler virtio_net
[    4.879911] CPU: 2 PID: 524 Comm: nvidia-smi Tainted: P           OE    4.15.0-1028-gcp #29-Ubuntu
[    4.879912] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[    4.880097] RIP: 0010:_nv007834rm+0x14/0xd0 [nvidia]
[    4.880098] RSP: 0018:ffff9f0abfd03dd8 EFLAGS: 00010002
[    4.880098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    4.880099] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008
[    4.880100] RBP: ffff9f0aa3f25e20 R08: ffff9f0aa3f25e9c R09: ffff9f0aa3f25ea8
[    4.880100] R10: 0000000000000008 R11: ffffffffc0b78660 R12: 0000000000000008
[    4.880101] R13: 0000000000000000 R14: ffff9f0aa4e94008 R15: 0000000000000000
[    4.880102] FS:  00007f6e8321eb80(0000) GS:ffff9f0abfd00000(0000) knlGS:0000000000000000
[    4.880102] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.880103] CR2: 0000000000000008 CR3: 00000004222ea005 CR4: 00000000003606e0
[    4.880107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    4.880108] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    4.880108] Call Trace:
[    4.880110]  <IRQ>
[    4.880254]  ? _nv036628rm+0x65/0x80 [nvidia]
[    4.880370]  ? _nv030847rm+0x84/0x480 [nvidia]
[    4.880493]  ? _nv030871rm+0xaf/0xd0 [nvidia]
[    4.880607]  ? rm_isr+0x7c/0x120 [nvidia]
[    4.880669]  ? nvidia_isr+0x8a/0x120 [nvidia]
[    4.880674]  ? __handle_irq_event_percpu+0x44/0x1a0
[    4.880675]  ? handle_irq_event_percpu+0x32/0x80
[    4.880676]  ? handle_irq_event+0x3b/0x60
[    4.880678]  ? handle_edge_irq+0x7c/0x190
[    4.880681]  ? handle_irq+0x20/0x30
[    4.880686]  ? do_IRQ+0x4e/0xd0
[    4.880688]  ? common_interrupt+0x8e/0x8e
[    4.880689]  </IRQ>
[    4.880814]  ? _nv021698rm+0x60/0x60 [nvidia]
[    4.880819]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    4.880884]  ? os_release_spinlock+0x1a/0x20 [nvidia]
[    4.881007]  ? _nv009189rm+0x265/0x3c0 [nvidia]
[    4.881129]  ? _nv033835rm+0x144/0x1b0 [nvidia]
[    4.881243]  ? _nv030757rm+0x41/0x70 [nvidia]
[    4.881512]  ? _nv006194rm+0x4a/0xa0 [nvidia]
[    4.881646]  ? _nv001136rm+0x24a/0x2d0 [nvidia]
[    4.881842]  ? _nv033609rm+0x93d/0xb80 [nvidia]
[    4.882027]  ? _nv033609rm+0x915/0xb80 [nvidia]
[    4.882146]  ? _nv001116rm+0x49c/0x650 [nvidia]
[    4.882266]  ? rm_init_adapter+0x11a/0x130 [nvidia]
[    4.882271]  ? wake_up_process+0x15/0x20
[    4.882628]  ? nv_open_device+0x2f8/0x790 [nvidia]
[    4.882977]  ? nvidia_open+0x2d0/0x410 [nvidia]
[    4.883227]  ? nvidia_open+0x2d0/0x410 [nvidia]
[    4.883289]  ? nvidia_frontend_open+0x58/0xa0 [nvidia]
[    4.883293]  ? chrdev_open+0xc4/0x1b0
[    4.883295]  ? do_dentry_open+0x1c2/0x310
[    4.883298]  ? __inode_permission+0x5b/0x160
[    4.883299]  ? cdev_put.part.3+0x20/0x20
[    4.883300]  ? vfs_open+0x4f/0x80
[    4.883302]  ? path_openat+0x66e/0x1770
[    4.883303]  ? legitimize_path.isra.28+0x2e/0x60
[    4.883305]  ? do_filp_open+0x9b/0x110
[    4.883307]  ? __check_object_size+0xaf/0x1b0
[    4.883309]  ? do_sys_open+0x1bb/0x2c0
[    4.883310]  ? do_sys_open+0x1bb/0x2c0
[    4.883312]  ? SyS_openat+0x14/0x20
[    4.883316]  ? do_syscall_64+0x7b/0x150
[    4.883318]  ? entry_SYSCALL_64_after_hwframe+0x42/0xb7
[    4.883319] Code: 83 c4 08 48 89 c7 5b 41 5c e9 19 fe 07 00 66 0f 1f 84 00 00 00 00 00 41 55 48 85 ff 49 89 f5 41 54 49 89 fc 53 0f 84 93 00 00 00 <48> 8b 17 48 85 d2 75 07 eb 68 66 90 48 89 c2 48 8b 42 10 48 85 
[    4.883479] RIP: _nv007834rm+0x14/0xd0 [nvidia] RSP: ffff9f0abfd03dd8
[    4.883480] CR2: 0000000000000008
[    4.883484] ---[ end trace cc8a587a35cec3a6 ]---
[    4.883485] Kernel panic - not syncing: Fatal exception in interrupt
[    4.887237] Kernel Offset: 0x18200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    5.291435] Rebooting in 10 seconds..
[   15.288760] ACPI MEMORY or I/O RESET_REG.

during modprobe and boot :D

Hi dbatischev,

Sorry for the late updating.
Is this issue remaining, or what is the current status?
Could you get a chance to try the latest TensorRT 5.1, as I couldn’t find the context to validate the original issue against TensorRT 5.1?

Hi Nfeng,

We’ve tried one more time right now.
And it seems to be fully broken now for 2080 but working on 1080 again.
New error is

Converting to UFF graph
No. nodes: 1247
python: ../builder/cudnnConvolutionTraits.cpp:58: static void nvinfer1::cudnn::CudnnConvolutionTraits::createConstants(nvinfer1::rt::cuda::CudnnConvolutionLayer&, nvinfer1::builder::GlobWriter&, const nvinfer1::rt::CommonContext&): Assertion `layer.configIsValid(context)' failed.
Aborted (core dumped)

Software versions: cuda 10.1.168, cudnn 7.5.0.56, trt 5.1.5, driver 418.67

Hi dbatischev,

I did observe your issue and it seems to be caused by very hidden bug within our building system.
Would you consider to run your model in FP32 or INT8 mode?
(INT8 running log shown below)

[I] [TRT] Debug synchronize completed successfully after build for layer: probability_colorize/Conv/Conv2D input reformatter 0 (type=9, tactic=0)
[I] [TRT] Debug synchronize completed successfully after build for layer: probability_colorize/Conv/Conv2D (type=14, tactic=-6980047749615980934)
[I] [TRT] Debug synchronize completed successfully after build for layer: probability_colorize/Conv/Tanh (type=0, tactic=0)
[I] [TRT] Debug synchronize completed successfully after build for layer: probability_colorize/out/Add (type=10, tactic=0)
[I] [TRT] Debug synchronize completed successfully after build for layer: probability_colorize/out/Mul_HL_1804289383 (type=10, tactic=0)
[I] [TRT] Debug synchronize completed successfully after build for layer: probability_colorize/out/Mul (type=19, tactic=0)
[I] [TRT] Data initialization and engine generation completed in 0.938048 seconds.
[I] Average over 10 runs is 9.04309 ms (host walltime is 9.27157 ms, 99% percentile time is 9.06378).
[I] Average over 10 runs is 9.11002 ms (host walltime is 9.34709 ms, 99% percentile time is 9.17792).
[I] Average over 10 runs is 9.20339 ms (host walltime is 9.45544 ms, 99% percentile time is 9.2665).
[I] Average over 10 runs is 9.22274 ms (host walltime is 9.48167 ms, 99% percentile time is 9.28752).
[I] Average over 10 runs is 9.23295 ms (host walltime is 9.49235 ms, 99% percentile time is 9.27085).
[I] Average over 10 runs is 9.20772 ms (host walltime is 9.46407 ms, 99% percentile time is 9.23286).
[I] Average over 10 runs is 9.23249 ms (host walltime is 9.49017 ms, 99% percentile time is 9.33901).
[I] Average over 10 runs is 9.28647 ms (host walltime is 9.54139 ms, 99% percentile time is 9.34672).
[I] Average over 10 runs is 9.23318 ms (host walltime is 9.4913 ms, 99% percentile time is 9.2841).
[I] Average over 10 runs is 9.38855 ms (host walltime is 9.64644 ms, 99% percentile time is 9.48998).
&&&& PASSED TensorRT.trtexec # ./trtexec --uff=iter_color_two_side_disp_160_ema_from_tf1_7_woabs.uff --uffInput=pan,1,136,240 --uffInput=color,3,136,240 --output=probability_colorize/out/Mul --int8 --verbose

Hi dbatischev,

Have you tried with our suggestions to run your model in FP32 or INT8 mode?
Any result can be shared?

Thanks