Issue while converting maskrcnn model to trt from etlt on Laptops

I have done exactly that. It even successfully generated the libnvinfer file and yet I faced the same issue.
I replaced and ran sudo ldconfig.

All the steps I followed are mentioned here

Actually the user guide does not mention some steps you mentioned. For example, (pip install nvidia-tensorrt==8.0.1.6]. Could you double check?

Alright so that was because I installed pip version of tensorrt. However, now I purged everything to start from scratch.

After going for fresh installation I ran this command
/usr/local/bin/cmake .. -DGPU_ARCHS=86 -DTRT_LIB_DIR=/usr/lib/x86_64-linux-gnu/ -DCMAKE_C_COMPILER=/usr/bin/gcc -DTRT_BIN_DIR=pwd/out
as mentioned in docs and encountered this error:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDNN_LIB
    linked by target "nvinfer_plugin" in directory /home/alaap/TensorRT/plugin
    linked by target "sample_algorithm_selector" in directory /home/alaap/TensorRT/samples/sampleAlgorithmSelector
    linked by target "sample_char_rnn" in directory /home/alaap/TensorRT/samples/sampleCharRNN
    linked by target "sample_dynamic_reshape" in directory /home/alaap/TensorRT/samples/sampleDynamicReshape
    linked by target "sample_fasterRCNN" in directory /home/alaap/TensorRT/samples/sampleFasterRCNN
    linked by target "sample_googlenet" in directory /home/alaap/TensorRT/samples/sampleGoogleNet
    linked by target "sample_int8" in directory /home/alaap/TensorRT/samples/sampleINT8
    linked by target "sample_int8_api" in directory /home/alaap/TensorRT/samples/sampleINT8API
    linked by target "sample_mlp" in directory /home/alaap/TensorRT/samples/sampleMLP
    linked by target "sample_mnist" in directory /home/alaap/TensorRT/samples/sampleMNIST
    linked by target "sample_mnist_api" in directory /home/alaap/TensorRT/samples/sampleMNISTAPI
    linked by target "sample_nmt" in directory /home/alaap/TensorRT/samples/sampleNMT
    linked by target "sample_onnx_mnist" in directory /home/alaap/TensorRT/samples/sampleOnnxMNIST
    linked by target "sample_reformat_free_io" in directory /home/alaap/TensorRT/samples/sampleReformatFreeIO
    linked by target "sample_ssd" in directory /home/alaap/TensorRT/samples/sampleSSD
    linked by target "sample_uff_fasterRCNN" in directory /home/alaap/TensorRT/samples/sampleUffFasterRCNN
    linked by target "sample_uff_maskRCNN" in directory /home/alaap/TensorRT/samples/sampleUffMaskRCNN
    linked by target "sample_uff_mnist" in directory /home/alaap/TensorRT/samples/sampleUffMNIST
    linked by target "sample_uff_plugin_v2_ext" in directory /home/alaap/TensorRT/samples/sampleUffPluginV2Ext
    linked by target "sample_uff_ssd" in directory /home/alaap/TensorRT/samples/sampleUffSSD
    linked by target "sample_onnx_mnist_coord_conv_ac" in directory /home/alaap/TensorRT/samples/sampleOnnxMnistCoordConvAC
    linked by target "trtexec" in directory /home/alaap/TensorRT/samples/trtexec
TENSORRT_LIBRARY_INFER
    linked by target "nvonnxparser_static" in directory /home/alaap/TensorRT/parsers/onnx
    linked by target "nvonnxparser" in directory /home/alaap/TensorRT/parsers/onnx
TENSORRT_LIBRARY_INFER_PLUGIN
    linked by target "nvonnxparser_static" in directory /home/alaap/TensorRT/parsers/onnx
    linked by target "nvonnxparser" in directory /home/alaap/TensorRT/parsers/onnx

-- Configuring incomplete, errors occurred!

Follow up on this, Since I was not able to install tensorRT properly on my local machine to actually run tao-converter, I used a docker image for the same.
Image had ubuntu 20.04, cuda 11.4, and tensorRT 8.0.1

I successfully built TRT OSS nvinfer_plugin and then ran the following command:

./tao-converter -k nvidia_tlt -d 3,832,1344 -o generate_detections,mask_fcn_logits/BiasAdd -e export/trt_newpep.fp16.engine -m 1 -t fp16 -i nchw model.step-32400.etlt

Output is still the same.

[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 417 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 848 MiB, GPU 417 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1668, GPU 735 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +191, GPU +324, now: CPU 1859, GPU 1059 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.

It stops after this much like tao docker. No output file is generated,

EDIT:
I managed to run everything on my local as well.

I ran the same command and here is the output:

[INFO] [MemUsageChange] Init CUDA: CPU +533, GPU +0, now: CPU 540, GPU 417 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 848 MiB, GPU 417 MiB
[WARNING] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 110.9.2
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +871, GPU +378, now: CPU 1791, GPU 795 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +127, GPU +60, now: CPU 1918, GPU 855 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead

But weirdly enough my terminal crashes after 2-3 mins and no output is generated. The same thing happened in docker as well, docker crashed and no output was generated. Shows nothing in logs as well.

I revisit above description mentioned in the very beginning. So, all runs well with 2 PCs ( 3090/A6000). But failed in 2 laptops ( 3070Ti/3080Ti ) . Are the laptops running with WSL?
Is it enough gpu memory?

Yes, they work fine on both the PCs but fail on laptops without any logs. When I run ./tao-converter it crashes the terminal itself with logs as mentioned above.

Here are a detailed description of all the Machines I tried on.

PCs:

  1. 3090 GPU (24GB) VRAM running Ubuntu 20.04 and Cuda 11.6
  2. A6000 [48GB] VRAM running Ubuntu 20.04 and Cuda 11.6

Laptops:

  1. 3070Ti GPU [8 GB] VRAM running Ubuntu 22.04 and Cuda 11.6
  2. 3080Ti GPU [16GB] VRAM running Ubuntu 20.04 and Cuda 11.6

Tao version is the same across the platforms.

NO, clean Ubuntu.

8GB and 16GB.

Hopefully, that’s enough.

OK, the laptops’ gpu memory is much smaller.

You can try to run another experiment to check if the trt engine can be generated.
Open temimal , login the tao docker.
$ tao mask_rcnn run /bin/bash

Then inside the docker, generate trt engine.
# converter -k nvidia_tlt -d xxx …

It is smaller but it is enough I guess. I have converted models in past on 2070 PC with 8GB ram. Moreover, it only uses 2.5GB of VRAM.

I tried that and ran
converter -k nvidia_tlt -d 3,832,1344 -o generate_detections,mask_fcn_logits/BiasAdd -e /workspace/tao-experiments/mask_rcnn/experiments/experiment_dir_retrain/export/trt_newpep.fp16.engine -t fp16 -i nchw -m 1 /workspace/tao-experiments/mask_rcnn/experiments/experiment_dir_retrain/model.step-32400.etlt:

Here are the logs. Output is similar. Docker crashes and exits. No engine is generated, logs say nothing.

[INFO] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 542, GPU 417 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 848 MiB, GPU 417 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +749, GPU +318, now: CPU 1669, GPU 735 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +619, GPU +268, now: CPU 2288, GPU 1003 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
2022-05-31 22:30:53,402 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

If possible, please try with more gpu memory.

I tried with a 16GB GPU memory laptop, and laptops don’t come with more GPU memory than that. It is not even using 2GB of memory usage. Moreover, I have converted it on a 2070 PC which had 8GB memory and it was successful there as well. Thus, memory does not seem to be a constraint.

Also, I tried with a different etlt mrcnn weight file as well just to be sure .etlt isn’t broken,

Could you please check if there are any difference between laptops and PCs?

  • nvidia driver
  • CUDA/TensorRT/Cudnn version
  • etc.

Everything stays consistent across the machines.

Nvidia-driver on all the machines is 510.47
Cuda version is 11.6 across all machines. (cuda_11.6.r11.6)
Tensor RT version is also consistent to 8.0.1.6 across all machines.

tao version is also consistent as mentioned

I have tried with both tao docker and building ./tao-converter as advised.

Could you please try another official maskrcnn etlt model in your laptops?
Please download the models from PeopleSegNet | NVIDIA NGC
Please note that the resolution is 960 X 576 . The ngc key is nvidia_tlt.

Please check loptop’s cpu memory. For mask_rcnn, during the conversion for fp16, the RAM usage peaks around 80G.

same thing, this also does not convert and no logs.

It is 16GB and 32GB. I don’t think laptops have 80GB RAM anywhere, they peak out at 32GB or 64GB at best. Is it possible to convert etlt model to engine file on a laptop for deployment? This seems to be a pretty standard use case as engine files are meant to be deployed on jetson devices or some laptops maybe in some cases.

Please let me know if there is any possible way to convert etlt to engine on a laptop.

So, the laptops cannot meet the CPU memory during the mask_rcnn engine conversion.

One more experiment, please set "-s " in the command line. We found that with this “-s” , for fp16 mode, the RAM usage peaks around 40G.

We also find that with “-s” , under int8 mode, the RAM usage peaks around 4G.

So, two workaround here.

  1. For laptops, please use “-s” and int8 mode
  2. For laptops, if all the cuda/cudnn/TensorRT version are the same as PC’s, you can directly copy the .engine file generated by PCs.

Versions are exactly same but won’t the GPU difference matter? As PCs are 3090/A600 and laptops are 3070TI/3080Ti.
Tho, I will once give this a try.

Sure, will give it a try as well and reply.

Need to make sure compute capability are the same.

https://developer.nvidia.com/cuda-gpus

I found a solution to this problem.

Since this is a Ram related issue and we need about 80GB Ram, 1 solution is temporarily increasing the “SWAP” Memory in the Linux system to 80GB. Once you have enough swap model conversion gets transferred from the main ram to swap and it is completed successfully, Once the model is converted you can release and delete the swap memory.

This method works fine and doesn’t affect the model’s performance.

Another way is this:

you can use int8 with strict mode but this can affect the performance a little bit so be mindful of that,

So 2 solutions to this issue mainly, int8 with -S or temp increase swap for sake of conversion.

This doesn’t work as I get this error: [TensorRT] WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. even if all the parameters are the same.

Thank you!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.