Invalid device function error when export .tlt file to .etlt

Hi,
When I try tlt-2.0 ssd examples, I got below error:
!tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt
-k $KEY
-o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt
-e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt
–batch_size 1
–data_type fp32

Using TensorFlow backend.
2020-05-17 23:05:31,179 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace-hekun/mydata/tlt-2.0/examples/ssd/specs/ssd_retrain_resnet18_kitti.txt
2020-05-17 23:05:34,726 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace-hekun/mydata/tlt-2.0/examples/ssd/specs/ssd_retrain_resnet18_kitti.txt
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_5 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_4 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_3 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘NMS’] as outputs
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 2 output network tensors.
Cuda error in file src/implicit_gemm.cu at line 648: invalid device function

My GPU is one P100.
Could anyone help me?
Thanks!!

I get the same error could anyone slove it?
Thanks

What is your cuda version?

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

I need to check further since I did not meet this error previously. Is it reproduced with default notebook with KITTI dataset?

BTW, actually the “Detected 1 inputs and 2 output network tensors.” means the output etlt file is already generated.

@ [821853959] Do you also meet this issue in P100?

I use the default notebook file in the ssd folder and my own dataset in KITTI format.
““Detected 1 inputs and 2 output network tensors.” means the output etlt file is already generated.”—Really?!
When I ran the notebook, I did have a .etlt file generated, but I thought that was wrong so I deleted it.
Let me try!

Yes. And my cuda version is 10.2 too…

Hi all,
Please share your cudnn version and trt version too.

CUDA:10.2
cuDNN: libcudnn7_7.6.5.32-1
TensorRT: trt7.0.0.11-ga-20191216

HI Morganh,
Although a.etlt file is generated, it should be incorrect. When I continue to run tlt-converter to generate TensorRT’s inference engine, I still get an error:
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
Cuda error in file src/implicit_gemm.cu at line 648: invalid device function

Although there is trt.engine generation, I tested it and the following error still occurs:
!tlt-infer ssd --trt -p $USER_EXPERIMENT_DIR/export/trt.engine
-e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt
-i $DATA_DOWNLOAD_DIR/infer_images
-o $USER_EXPERIMENT_DIR/ssd_infer_images
-t 0.4
Using TensorFlow backend.
2020-05-18 05:23:50,058 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace-hekun/mydata/tlt-2.0/examples/ssd/specs/ssd_retrain_resnet18_kitti.txt
2020-05-18 05:23:50,060 [INFO] iva.ssd.scripts.inference_trt: Loading cached TensorRT engine from /workspace-hekun/mydata/tlt-2.0/examples/ssd/export/trt.engine
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
0%| | 0/8 [00:00<?, ?it/s]#assertion/trt_oss_src/TensorRT/plugin/nmsPlugin/nmsPlugin.cpp,118
Aborted (core dumped)

The tlt container cuda version is 10.0 and lochhost cuda version is 10.2.
Container tensorrt version is 7.0.0+cuda10 and localhost tensorrt version is 7.0.0+10.2

Please share the “nvidia-smi” result of your localhost too. Sorry to bother you since I am still checking if there is any culprit.

First of all, greate thanks for your help~~
NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2

Hi kenh2nnz6,
I find that it is caused by oss plugin. For P100, DGPU_ARCHS should be 60. We built the oss plugins when we built the TLT docker and did not specify the -DGPU_ARCHS=60.
So, for unblocking your case, it is necessary to build an oss plugin inside the docker.
I will also sync with internal team for this issue.

Installing TRT OSS to the base docker.

mkdir trt_oss_src && \

cd trt_oss_src && \

echo "$PWD Building TRT OSS..." && \

wget -O "./cmake-3.16.2-Linux-x86_64.tar.gz" 'https://github.com/Kitware/CMake/releases/download/v3.16.2/cmake-3.16.2-Linux-x86_64.tar.gz' && \

tar xzf "./cmake-3.16.2-Linux-x86_64.tar.gz" && \

git clone -b release/7.0 https://github.com/nvidia/TensorRT TensorRT && \

cd TensorRT && \

git submodule update --init --recursive && \

mkdir -p build && cd build  && \

../../cmake-3.16.2-Linux-x86_64/bin/cmake .. -DGPU_ARCHS=60 -DTRT_LIB_DIR=/usr/lib/x86_64-linux-gnu -DTRT_BIN_DIR=`pwd`/out -DCUDA_VERISON=10.0 -DCUDNN_VERSION=7.6 && \

make -j8 && \

cp out/libnvinfer_plugin.so.7.0.0.1 /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.7.0.0 && \

cp out/libnvinfer_plugin_static.a /usr/lib/x86_64-linux-gnu/libnvinfer_plugin_static.a && \

cd ../../../ && \

rm -rf trt_oss_src

Thanks very much! It works!!

It works!

any upgrade,I run blow scripts, but same issue

NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_5 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_4 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_3 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘NMS’] as outputs

BTW ,For RTX 8000,RTX2080Ti, what DGPU_ARCHS?

@zhobin8 ,
You did not meet the same error “Cuda error in file src/implicit_gemm.cu at line 648: invalid device function” as @kenh2nnz6 .
For your result, please check if the etlt model already generated. Seems there is no error during your log.