Hi,
When I try tlt-2.0 ssd examples, I got below error:
!tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt
-k $KEY
-o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt
-e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt
–batch_size 1
–data_type fp32
Using TensorFlow backend.
2020-05-17 23:05:31,179 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace-hekun/mydata/tlt-2.0/examples/ssd/specs/ssd_retrain_resnet18_kitti.txt
2020-05-17 23:05:34,726 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace-hekun/mydata/tlt-2.0/examples/ssd/specs/ssd_retrain_resnet18_kitti.txt
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_5 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_4 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_3 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘NMS’] as outputs
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] INFO: Detected 1 inputs and 2 output network tensors.
Cuda error in file src/implicit_gemm.cu at line 648: invalid device function
My GPU is one P100.
Could anyone help me?
Thanks!!
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
I use the default notebook file in the ssd folder and my own dataset in KITTI format.
““Detected 1 inputs and 2 output network tensors.” means the output etlt file is already generated.”—Really?!
When I ran the notebook, I did have a .etlt file generated, but I thought that was wrong so I deleted it.
Let me try!
HI Morganh,
Although a.etlt file is generated, it should be incorrect. When I continue to run tlt-converter to generate TensorRT’s inference engine, I still get an error:
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
Cuda error in file src/implicit_gemm.cu at line 648: invalid device function
Although there is trt.engine generation, I tested it and the following error still occurs:
!tlt-infer ssd --trt -p $USER_EXPERIMENT_DIR/export/trt.engine
-e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt
-i $DATA_DOWNLOAD_DIR/infer_images
-o $USER_EXPERIMENT_DIR/ssd_infer_images
-t 0.4
Using TensorFlow backend.
2020-05-18 05:23:50,058 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace-hekun/mydata/tlt-2.0/examples/ssd/specs/ssd_retrain_resnet18_kitti.txt
2020-05-18 05:23:50,060 [INFO] iva.ssd.scripts.inference_trt: Loading cached TensorRT engine from /workspace-hekun/mydata/tlt-2.0/examples/ssd/export/trt.engine
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
0%| | 0/8 [00:00<?, ?it/s]#assertion/trt_oss_src/TensorRT/plugin/nmsPlugin/nmsPlugin.cpp,118
Aborted (core dumped)
The tlt container cuda version is 10.0 and lochhost cuda version is 10.2.
Container tensorrt version is 7.0.0+cuda10 and localhost tensorrt version is 7.0.0+10.2
Hi kenh2nnz6,
I find that it is caused by oss plugin. For P100, DGPU_ARCHS should be 60. We built the oss plugins when we built the TLT docker and did not specify the -DGPU_ARCHS=60.
So, for unblocking your case, it is necessary to build an oss plugin inside the docker.
I will also sync with internal team for this issue.
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_5 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_4 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_3 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘NMS’] as outputs
@zhobin8 ,
You did not meet the same error “Cuda error in file src/implicit_gemm.cu at line 648: invalid device function” as @kenh2nnz6 .
For your result, please check if the etlt model already generated. Seems there is no error during your log.