Regarding an immediate and straight forward solution to my queries

Hi all, I have been creating new topics to get solutions to the problems I am facing on how to implement CUDA graph, layer fusion and DLA core.

I am getting suggestions from you but it is not working and am getting into errors/problems after problems.

I have been trying all your suggestions but facing issues related to import libraries/packages/installation is not going through properly and most of my time is being consumed in this only but not progressing any more.
You can verify with my name Nagaraj Trivedi how many discussions I have opened.
I am a working professional and get little time with my office work. I need this solution for completing my Master’s thesis and without cuda graph and layer fusion it wont complete.

I need a verify straight forward and simple code to implement the cuda graph and layer fusion on all the sample programs you have provided as part of both c++ and python examples.

Please don’t suggest me to look the trt_exec source code as it is complex, cannot be understood and tried for other sample programs.

Regarding Layer Fusion I understand that the SDK doesn’t it internally, but there is way to verify it through logger function. At least, provide some code how to use this logger function to log information about whether layer fusion has taken place or not.

I also suggest to modify the sample programs in such a simple and straight forward way that it becomes very easy to take these sample programs and modify for our needs.

Please don’t mind otherwise, but this is what the practical problems I have been facing.

Thanks and Regards

Nagaraj Trivedi

Hi,

If you want to use cuda graph for inference, unfortunately, trt_exec is the simplest source you can check.
You don’t need to modify the source code if you just want to compare the performance with or without cuda graph. The flag --useCudaGraph is enough.

Layer fusion is not open-sourced and cannot be controlled in the user space.
The only info is the TensorRT log which can be found when using trt_exec.

If you don’t want to use it, you can also set it through TensorRT API:

For example:
https://github.com/NVIDIA/TensorRT/blob/release/8.6/samples/python/onnx_custom_plugin/sample.py#L50C1-L55C38

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

# Builds TensorRT Engine
def build_engine(model_path):

    builder = trt.Builder(TRT_LOGGER)

Thanks.

Hi, thank you for your response. Yes trt_exec source code is the right place look into and I have studied this code multiple times.
I tried inferencing a resNet50 model by feeding a test input (.dat file) with option --loadInputs, but it could not infer properly.
May I request you to provide at least few .dat files so that it becomes clear that it works properly.

If you can help me with either by providing the few .dat files or correct code to convert an image file to a .dat file then most of my work will be done. Because I might not have written proper code to convert the image file to .dat file (as it involves resizing, cropping etc)

Thanks and Regards

Nagaraj Trivedi

Dear @trivedi.nagaraj ,
The reported issues are already followed at The trt exec could not predict the image properly with resNet50.onnx model - #7 by SivaRamaKrishnaNV

Hi SivaramaKrishna, yes the using trt.Logger.INFO it worked.
And the trtexec you are addressing in another discussion.

Thanks and Regards

Nagaraj Trivedi

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.