Code modified for CUDA Graph in the sampleINTAPI is throwing error

Hi all, I tried modifying the sample code file sampleINT8API.cpp from the directory /usr/src/tensorrt/samples/sampleINT8API/

It is throwing the following error

[01/07/2024-03:22:44] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1373 MiB, GPU 7177 MiB
[01/07/2024-03:22:44] [I] Started capturing CUDA graph

[01/07/2024-03:22:44] [E] [TRT] 1: [blobInfo.cpp::getHostScale::803] Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[01/07/2024-03:22:44] [F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[01/07/2024-03:22:44] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1372, GPU 7177 (MiB)
[01/07/2024-03:22:44] [F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)

I have modified only one API sample::Logger::TestResult SampleINT8API::infer()

For your reference I have attached the file containing contents of this entire function. My modifications are under the macro
ifdef TRT_DEBUG and endif to differentiate my modifications from the original code. The TRT_DEBUG macro is defined in the beginning of the source file.

Thanks and Regards

Nagaraj Trivedi

code_modification_cuda_graph_capture.txt (2.3 KB)

Dear @trivedi.nagaraj,
From past topics, I am assuming you are looking for Performing inference Resnet50 model with sample Input and check layer fusion info and impact of cudaGraph.
May I know using useCudaGraph flag in trtexec did not help your need? This flag enables cudaGraph APIs calls. It can be noticed using nsys trace.
I know you have some issue with trtexec with loadInput at The trt exec could not predict the image properly with resNet50.onnx model . But I believe it could be issue with Jetpack version or input data or input parameters to trtexec. Please double check that.
trtexec is the best way to have all your needs in a single sample with limited modifications.

Hi SivaRamaKirshna, yes I am looking for ResNet50 model inference with layer fusion, INT8 precision and CUDA graph.
But trtexec inference is not working as it is not predicting correctly.

Also installation of JetPack 5.0 version is giving errors for me. Is it possible that we can have a google meet and I will share you my screen and you can help me in installing the JetPack 5.0

Thanks and Regards

Nagaraj Trivedi

The command I provide works for FP32. Is it the actually issue? Are you using additional int8 flags with trtexec

Hi SivaramKrishna, not matter whether I use INT8 or FP32 it is not predicting correctly.

Thanks and Regards

Nagaraj Trivedi

Hi ShivaramaKrishna, if you fix the cuda graph error which I am getting in the sampleINTAPI then I am done with what I require. Because through this I have already achieved INT8, Layer fusion. Now the only item which is pending is CUDA graph. If you fix this problem then I may require trtexec now.
But before that get it clarified with can CUDA graph be used when there is an asynchronous copying (copy between host and device memory) and context->enqueuev2() is used.

Thanks and Regards

Nagaraj Trivedi

Hi ShivaRamaKrishna, I have fixed this issue and is working fine without throwing any errors.
Here is the output from my logs which I launched cuda graph 2 times

[01/08/2024-05:43:26] [I] Started capturing CUDA graph

[01/08/2024-05:43:26] [I] Ending capturing CUDA graph

[01/08/2024-05:43:26] [I] Launching CUDA graph

[01/08/2024-05:43:26] [I] SampleINT8API result: Detected:
[01/08/2024-05:43:26] [I] [1] airliner
[01/08/2024-05:43:26] [I] [2] warplane
[01/08/2024-05:43:26] [I] [3] projectile
[01/08/2024-05:43:26] [I] [4] space shuttle
[01/08/2024-05:43:26] [I] [5] wing
[01/08/2024-05:43:26] [I] SampleINT8API result: Detected:
[01/08/2024-05:43:26] [I] [1] airliner
[01/08/2024-05:43:26] [I] [2] warplane
[01/08/2024-05:43:26] [I] [3] projectile
[01/08/2024-05:43:26] [I] [4] space shuttle
[01/08/2024-05:43:26] [I] [5] wing

The fix is, we need to call the context->enqueuev2() once before beginning the cuda graph capture.

I got this solution from the nvidia github link when I started searching how to implement the cuda graph in c++. From this link below I found the solution

In the above link I found this information
/*
TensorRT: nvinfer1::IExecutionContext Class Reference
Note:
Calling enqueueV2() with a stream in CUDA graph capture mode has a known issue.
If dynamic shapes are used, the first enqueueV2() call after a setInputShapeBinding() call will cause failure in stream capture due to resource allocation.
Please call enqueueV2() once before capturing the graph.
*/

Thank you for all the support and help you have provided me till now.

I will further continue with activities

  1. Modify the code to perform N number of inferences, where N can be provided as one of the command line inputs to it
  2. I want to see this difference in execution speed with and without cuda graph.
  3. Use the DLA core option to find the energy efficiency

I may further require your help in modifying this code to

  1. Make it generic to convert and infer any models other than ResNet50.onnx
  2. Make use of DLA core to find the energy efficiency.

Please help me in that.

Thanks and Regards

Nagaraj Trivedi

Hi ShivaRamaKrishna, while executing this binary (sampleINT8API), I have found one warning message
[W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.

Does it mean that model is not converted with INT8 precision ? or It has converted the model with INT8 precision but weights are still considered as 32 bits, please clarify.

If your answer is “It has not converted the model to INT8 precision” then let me know where to find the calibrator and if it is not present then how to create it.

Thanks and Regards

Nagaraj Trivedi

Dear @trivedi.nagaraj,
The INT8 model is generated using calibrator or by using dynamic range info provided using external file. You can check this in build() function.
If we use calibrator, it finds the active Tensor range for a set of input set of images used in calibration process.

Hi ShivaRamaKrishnan, thank you for the response. I am more interested in inferencing

  1. Once with calibrator file

  2. Once with providing the dynamic range. Let me know how it can be done.
    I have read the build() method and it is mentioned here
    // Enable INT8 model. Required to set custom per-tensor dynamic range or INT8 Calibration
    config->setFlag(BuilderFlag::kINT8);
    // Mark calibrator as null. As user provides dynamic range for each tensor, no calibrator is required
    config->setInt8Calibrator(nullptr);
    // force layer to execute with required precision
    setLayerPrecision(network);

    // set INT8 Per Tensor Dynamic range
    if (!setDynamicRange(network))
    {
    sample::gLogError << “Unable to set per-tensor dynamic range.” << std::endl;
    return sample::Logger::TestResult::kFAILED;
    }

And when I inferred it with --verbose option
and found this information
[01/08/2024-10:13:13] [I] Layer: node_of_gpu_0/res2_0_branch2b_bn_1. Precision: INT8
[01/08/2024-10:13:13] [I] Tensor: gpu_0/res2_0_branch2b_bn_1. OutputType: INT8

[01/08/2024-10:13:13] [I] If dynamic range for a tensor is missing, TensorRT will run inference assuming dynamic range for the tensor as optional.
[01/08/2024-10:13:13] [I] If dynamic range for a tensor is required then inference will fail. Follow README.md to generate missing per-tensor dynamic range.
[01/08/2024-10:13:13] [V] [TRT] Setting dynamic range for gpu_0/data_0 to [-1.00024,1.00024]

[01/08/2024-10:13:13] [I] Per Tensor Dynamic Range Values for the Network:
[01/08/2024-10:13:13] [I] Tensor: (Unnamed Layer* 180) [Softmax]_output. Max Absolute Dynamic Range: 0.0303731
[01/08/2024-10:13:13] [I] Tensor: (Unnamed Layer* 179) [Shuffle]_output. Max Absolute Dynamic Range: 6.46343
[01/08/2024-10:13:13] [I] Tensor: (Unnamed Layer* 177) [Shuffle]_output. Max Absolute Dynamic Range: 0.0365279
[01/08/2024-10:13:13] [I] Tensor: (Unnamed Layer* 176) [Constant]_output. Max Absolute Dynamic Range: 0.0365279
[01/08/2024-10:13:13] [I] Tensor: (Unnamed Layer* 175) [Fully Connected]_output. Max Absolute Dynamic Range: 6.40009
[01/08/2024-10:13:13] [I] Tensor: (Unnamed Layer* 174) [Constant]_output. Max Absolute Dynamic Range: 0.443716

Does it mean that though both calibration file and dynamic range are not mentioned but still it is converted to INT8. Please clarify it.

The next clarification I have is how to generate the calibrator for making use in config->setInt8Calibrator(nullptr); , instead of nullptr

Thanks and Regards

Nagaraj Trivedi

Hi ShiavaRamaKrishna, in continuation with my previous query/clarifications I few more listed below.

  1. How to modify this code to infer a batch of input images (let us say a batch size of 32, 64, 128, 256). This information is required for me to verify how the batch size is impacting model inference speed.

  2. As of now it is reading the input image file of type .ppm, how to modify it to work with .jpg, .jpeg or .dat files

Please clarify these doubts.

Thanks and Regards

Nagaraj Trivedi

Hi ShivaRamaKrishna, please update me for the queries I have raised. I tried converting the .jpg image to the .ppm and tried inferencing but it could not infer properly.

  1. I need at least 10000 such .ppm images for experimenting. Let me know from which source the ppm file for the image airliner.ppm you have taken
  2. Otherwise let me know a procedure how to convert .jpg or .jpeg images to .ppm format

Apart from this please update me with all the queries I have raised.

Thanks and Regards
Nagaraj Trivedi

Dear @trivedi.nagaraj,

  1. The sampleINTAPI already uses dynamic range from /usr/src/tensorrt/data/int8_api /resnet50_per_tensor_dynamic_range.txt and prepares INT8 model.
  2. If you want to perform calibration using a set of images(use atleast 500 images) and prepare calibration file. You may find calibrator implementation at /usr/src/tensorrt/samples/ommon/sampleEngines.cpp . As you are on different Jetapack release please check grep -lir "writeCalibrationCache" to find calibrator implementation.
  3. Currently, The ResNet50 model is static model with batchsize 1
  4. You can use any dataset like imagenet to get input images for your test. You can use convert linux tool to convert jpeg → ppm format for your test.

Hi ShivaRamaKrishna, thank you for your response clarifying my doubts.
Please find my comments below your points mentioned for 1 to 4

  1. Yes there is a file called resnet50_per_tensor_dynamic_range.txt but wanted to get it confirmed with you
  2. I will look into the sampleEngines.cpp to know how the calibration file is created. I have already looked at it before and look it again
  3. How I can modify it to work with different batch sizes, please clarify.
    I have read this in the tensorrt document
  • If the input model is in ONNX format, use the --minShapes, --optShapes, and --maxShapes flags to control the range of input shapes including batch size.
    *** --minShapes=, --optShapes=, and --maxShapes=: Specify the range of the input shapes to build the engine with. Only required if the input model is in ONNX format.**
    I believe when we use these options through command line then it might be internally performing some operation that will result into building or inferencing the engine with different batch size
  1. I tried converting jpg → ppm but the prediction was wrong. Is that I need to resize and normalize and perform transpose of jpg or jpeg image before converting to the ppm

Please clarify all these doubts.

Thanks and Regards

Nagaraj Trivedi

In continuation to this, I tried inferencing with a jpg image it thrown error. It is due the reason that in the function prepareInput() it checks the file format to in .ppm. I even tried commenting it and it could infer but falsely. I request you to once verify inferring with either .jpg or .jpeg image
Please see the below code
bool SampleINT8API::prepareInput(const samplesCommon::BufferManager& buffers)
{
sample::gLogInfo<<"Image file name is "<<mParams.imageFileName<<std::endl;
if (samplesCommon::toLower(samplesCommon::getFileType(mParams.imageFileName)).compare(“ppm”) != 0)
** {**
** sample::gLogError << “Wrong format: " << mParams.imageFileName << " is not a ppm file.” << std::endl;**
** return false;**
** }**

Thanks and Regards

Nagaraj Trivedi

Hi,

3. How I can modify it to work with different batch sizes, …

Since the model is static, please modify the model parameter directly.
A sample can be found in the below comment:

4. I tried converting jpg → ppm but the prediction was wrong. …

The image needs to be resized into 3x224x224 (CHW).
But the normalization and scaling are done in cod already:

Thanks.

Hi, thank you for your response. I tried searching in the internet, but not finding a proper website to download different images which are of format .ppm
If you have multiple such .ppm images then please provide me. Otherwise let me know a way how can I convert the .jpg or .jpeg to the .ppm format.

Thanks and Regards

Nagaraj Trivedi

Hi, please address the issue also which I have asked. The reason is there is only one image airliner.ppm file but I need around 10000 images. I tried searching in the internet but could not find them.
This sample takes only .ppm file and doesn’t accept others. I tried commenting the code that checks for .ppm format and fed the jpg image but it could not predict it properly.
If you have collection of .ppm files or know the link from where I can download them then let me know. Otherwise let me know how to modify this code to work with both jpg and jpeg format or a way how I can convert jpg or jpeg files to the ppm format

Here is the code that checks for the ppm format in the sample code

In continuation to this, I tried inferencing with a jpg image it thrown error. It is due the reason that in the function prepareInput() it checks the file format to in .ppm. I even tried commenting it and it could infer but falsely. I request you to once verify inferring with either .jpg or .jpeg image
Please see the below code
bool SampleINT8API::prepareInput(const samplesCommon::BufferManager& buffers)
{
sample::gLogInfo<<"Image file name is "<<mParams.imageFileName<<std::endl;
if (samplesCommon::toLower(samplesCommon::getFileType(mParams.imageFileName)).compare(“ppm”) != 0)
** {**
** sample::gLogError << “Wrong format: " << mParams.imageFileName << " is not a ppm file.” << std::endl;**
** return false;**
** }**

Thanks and Regards

Nagaraj Trivedi

Dear @trivedi.nagaraj ,
ImageNet dataset : ImageNet
PASCAL VOC dataset : The PASCAL Visual Object Classes Challenge 2012 (VOC2012)

Convert utility : https://linux.die.net/man/1/convert
somr stackoverflow reference : jpeg - How to convert images from the jpg format to ppm(P3) - Stack Overflow
command line - How to resize an image through the terminal? - Ask Ubuntu

You may use coco dataset(COCO - Common Objects in Context) as well.