Hello,
I have converted tensorflow yolov3 to TF-TRT on my laptop. I transfer the whole folder to TX2 and tried to run it. But I faced several different errors. each time that I run the script (without any changes in the script or the libraries) different error appears. somethings like ‘killed’, ‘run time error’ and ‘memory error’. I first thought that the errors caused by differences in version of tensorflows. To convert the model to TF-TRT I used tensorflow 2.2 and on the TX2 I have tensorflow 2.5. I upgrade the tf of my laptop to 2.5 and I can run the TF-TRT model. But I still have problem with TX2. Now I think that is not because of versions. I still getting memory error and sometime the terminal just gets freeze without any progress for long time and I had to quit it.
How can I fix it? Why this is happening? the last error copied below:
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run Identity: Dst tensor is not initialized. [Op:Identity]
BTW I can run the yolov3 original version (without any TRT) on TX2 but the FPS is very low.
How should I install cudnn and cuda toolkit? As far as I remember I installed them while I was flashing the TX2 via jetpack. but I want to be sure that there is no need to further installation. Is that correct?
How can I check if they are installed? or what is their versions?
Thank you
Thank you for the answer.
I actually have more questions, Let me explain.
I need to convert some models to be able to deploy them on jetson devices. I have tried the TensorRT for Yolov3 trained on coco 80, but I wasn’t successful to inference it so I decided to do the TF-TRT . It worked on my laptop, the FPS is increased but the size and the GPU memory usage didn’t changed. Size of model was 300MB, it gets abit bigger. Before and after TF-TRT model still using 16 GB GPU memory.
Is it sth usual? I mean is it ok or there is sth wrong? I expected to achieve lower size, lesser GPU memory usage and higher FPS (BTW nodes are reduced).
The important thing is that the FPS jumps hardly after TF-TRT. I got around 3FPS before TF-TRT but after that I am getting 4,6,7,8,9 FPSs, but the FPS is not changing smoothly, for example for the first frame I get 4, and for the second frame I get 9 FPS, I can see these jumps in the visualization over the video as well. why this happened? How can I fix it?
I have read that TRT has better performance than TF-TRT. Is it True? What is the exact difference between them? I am confused.
I have another model that I need to convert it to TRT but it is a pytorch model (HourGlass CNN). Do you know how I can do it? Is there any valid/working repo on github or tutorials on YouTube which you can share?
Tensorflow to TRT is easier or Pytorch to TRT?
why the TF-TRT or TRT implementation and test should be on a same device (computer)? what is the reason that I can not generate it on my PC and then use it on jetson devices?
I need to use TF-TRT models on NANO, TX2 and AGX xavier. Should I do the conversion on all these devices separately? or generating in one of them can work on the other jetson devices as well?
If you are using the darknet format, you can find a good example in the below folder.
The sample will convert YOLOv3 into pure TensorRT rather than TF-TRT.
Q1: Since TensorFlow is relatively heavy on Jetson, the performance (due to throughput) and memory usage is not ideal.
Q2: As above, it’s more recommended to use pure TensorRT.
Q3: TF-TRT is a TensorFlow library with some integrated TensorRT implementation.
So you will need to load both TensorFlow and TensoRT library to make it work.
This is not a recommended way on a resource-limited platform such as TX2.
Q4: You can convert it into ONNX format first.
Then the model can be deployed with TensorRT via trtexec directly.
/usr/src/tensorrt/bin/trtexec --onnx=[model]
Q5: PyTorch is better. We do observe some failures from the TF to TRT path.
Q6: There are some possible issues.
First, the library version may be different. Usually, the desktop user will get the latest software release.
Second, since Jetson is resource-limited, most of the issue is related to the OOM.
Q7: Unfortunately, yes.
Since TenosrRT is a hardware-level optimization, the engine file needs to be generated on the target device directly.
I figured out that during running my model there were some progresses running at the background at the same time that is why I had FPS jump (GPU was engaged with other processes). when I restart the system or kill all running items the FPS gets more stable. BTW I am not sure if this is the reason, I write it here because it might help someone else searching for this issue.
yes TRT has better performance than TF-TRT but I read somewhere that TRT can not support some types of layers but TF-TRT can support all types of layers. Is that correct?
I tried to convert it but there are some unsupported layers within Hourglass model, I searched a lot but didn’t find anything useful. How can I make it work?
The backend implementation for TensorRT and TF-TRT(TRT part) are identical.
But if a layer cannot be inference with TensorRT, TF-TRT can fallback it into TensorFlow.