TensorRT inference process

Description

Hi, I am new to TensorRT C++, and I tried to load a pre-build tensorRT model and run it. However, when I have finished deserializing my tensorRT model, I found out some trouble when I referred to the “TensorRT documentation”.
In the documentation the inference process requires user to “set up a buffer array pointing to the input and output buffers on the GPU”. But, in the code below,:

void buffers[2];*
**buffers[inputIndex] = inputBuffer; **
buffers[outputIndex] = outputBuffer;

the variable “inputBuffer”, “outputBuffer” were not specified in the above sections.

Could you give me some more details about these 2 varibales? And I have put my UNFINISHED code below just for convenience.

int main(int argc, char** argv){
std::cout << “Read and load the engine” << std::endl;
IRuntime* runtime = createInferRuntime(sample::gLogger);
std::string cached_path = “./itest_8.trt”;
std::ifstream fin(cached_path);
std::string cached_engine = “”;
std::stringstream buffer;
buffer << fin.rdbuf();
cached_engine.append(buffer.str());
fin.close();

ICudaEngine* loaded_engine = runtime>deserializeCudaEngine(cached_engine.data(), cached_engine.size(), nullptr); 
std::cout << "Loading complete!" << std::endl; 

std::cout << "now let's do some inference" << std::endl; 
IExecutionContext *context = loaded_engine->createExecutionContext(); 
int input_node = loaded_engine->getBindingIndex("input_1"); 
int output_node = loaded_engine->getBindingIndex("output_1"); 
void* buffers[2]; 

////////////////////////////////////////
buffers[input_node] = inputbuffer; 	//Now the problem is here
////////////////////////////////////////
return 0; 

}

Many Thanks!

Environment

TensorRT Version: 7.2.2
GPU Type: Titan V;
Nvidia Driver Version: 450.51.05
CUDA Version: 11.0
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.7
TensorFlow Version (if applicable): 1.14
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

&&&& RUNNING TensorRT.trtexec # ./trtexec --loadEngine=./itest_8.trt --batch=1 --verbose
[04/30/2021-14:14:46] [I] === Model Options ===
[04/30/2021-14:14:46] [I] Format: *
[04/30/2021-14:14:46] [I] Model:
[04/30/2021-14:14:46] [I] Output:
[04/30/2021-14:14:46] [I] === Build Options ===
[04/30/2021-14:14:46] [I] Max batch: 1
[04/30/2021-14:14:46] [I] Workspace: 16 MiB
[04/30/2021-14:14:46] [I] minTiming: 1
[04/30/2021-14:14:46] [I] avgTiming: 8
[04/30/2021-14:14:46] [I] Precision: FP32
[04/30/2021-14:14:46] [I] Calibration:
[04/30/2021-14:14:46] [I] Refit: Disabled
[04/30/2021-14:14:46] [I] Safe mode: Disabled
[04/30/2021-14:14:46] [I] Save engine:
[04/30/2021-14:14:46] [I] Load engine: /home/wlx/new_project/itest_8.trt
[04/30/2021-14:14:46] [I] Builder Cache: Enabled
[04/30/2021-14:14:46] [I] NVTX verbosity: 0
[04/30/2021-14:14:46] [I] Tactic sources: Using default tactic sources
[04/30/2021-14:14:46] [I] Input(s)s format: fp32:CHW
[04/30/2021-14:14:46] [I] Output(s)s format: fp32:CHW
[04/30/2021-14:14:46] [I] Input build shapes: model
[04/30/2021-14:14:46] [I] Input calibration shapes: model
[04/30/2021-14:14:46] [I] === System Options ===
[04/30/2021-14:14:46] [I] Device: 0
[04/30/2021-14:14:46] [I] DLACore:
[04/30/2021-14:14:46] [I] Plugins:
[04/30/2021-14:14:46] [I] === Inference Options ===
[04/30/2021-14:14:46] [I] Batch: 1
[04/30/2021-14:14:46] [I] Input inference shapes: model
[04/30/2021-14:14:46] [I] Iterations: 10
[04/30/2021-14:14:46] [I] Duration: 3s (+ 200ms warm up)
[04/30/2021-14:14:46] [I] Sleep time: 0ms
[04/30/2021-14:14:46] [I] Streams: 1
[04/30/2021-14:14:46] [I] ExposeDMA: Disabled
[04/30/2021-14:14:46] [I] Data transfers: Enabled
[04/30/2021-14:14:46] [I] Spin-wait: Disabled
[04/30/2021-14:14:46] [I] Multithreading: Disabled
[04/30/2021-14:14:46] [I] CUDA Graph: Disabled
[04/30/2021-14:14:46] [I] Separate profiling: Disabled
[04/30/2021-14:14:46] [I] Skip inference: Disabled
[04/30/2021-14:14:46] [I] Inputs:
[04/30/2021-14:14:46] [I] === Reporting Options ===
[04/30/2021-14:14:46] [I] Verbose: Enabled
[04/30/2021-14:14:46] [I] Averages: 10 inferences
[04/30/2021-14:14:46] [I] Percentile: 99
[04/30/2021-14:14:46] [I] Dump refittable s:Disabled
[04/30/2021-14:14:46] [I] Dump output: Disabled
[04/30/2021-14:14:46] [I] Profile: Disabled
[04/30/2021-14:14:46] [I] Export timing to JSON file:
[04/30/2021-14:14:46] [I] Export output to JSON file:
[04/30/2021-14:14:46] [I] Export profile to JSON file:
[04/30/2021-14:14:46] [I]
[04/30/2021-14:14:54] [I] === Device Information ===
[04/30/2021-14:14:54] [I] Selected Device: TITAN V
[04/30/2021-14:14:54] [I] Compute Capability: 7.0
[04/30/2021-14:14:54] [I] SMs: 80
[04/30/2021-14:14:54] [I] Compute Clock Rate: 1.455 GHz
[04/30/2021-14:14:54] [I] Device Global Memory: 12066 MiB
[04/30/2021-14:14:54] [I] Shared Memory per SM: 96 KiB
[04/30/2021-14:14:54] [I] Memory Bus Width: 3072 bits (ECC disabled)
[04/30/2021-14:14:54] [I] Memory Clock Rate: 0.85 GHz
[04/30/2021-14:14:54] [I]
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::Proposal version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::Split version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[04/30/2021-14:14:54] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[04/30/2021-14:14:55] [W] [TRT] TensorRT was linked against cuDNN 8.0.5 but loaded cuDNN 8.0.4
[04/30/2021-14:14:56] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.2.0 but loaded cuBLAS/cuBLAS LT 11.1.0
[04/30/2021-14:14:56] [V] [TRT] Deserialize required 1106093 microseconds.
[04/30/2021-14:14:56] [I] Engine loaded in 1.76435 sec.
[04/30/2021-14:14:56] [W] [TRT] TensorRT was linked against cuDNN 8.0.5 but loaded cuDNN 8.0.4
[04/30/2021-14:14:56] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.2.0 but loaded cuBLAS/cuBLAS LT 11.1.0
[04/30/2021-14:14:56] [V] [TRT] Allocated persistent device memory of size 39212544
[04/30/2021-14:14:56] [V] [TRT] Allocated activation device memory of size 116225536
[04/30/2021-14:14:56] [V] [TRT] Assigning persistent memory blocks for various profiles
[04/30/2021-14:14:56] [I] Starting inference
[04/30/2021-14:14:59] [I] Warmup completed 1 queries over 200 ms
[04/30/2021-14:14:59] [I] Timing trace has 666 queries over 2.75008 s
[04/30/2021-14:14:59] [I] Trace averages of 10 runs:
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.44529 ms - Host latency: 4.8321 ms (end to end 7.86976 ms, enqueue 2.26097 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.43332 ms - Host latency: 4.82082 ms (end to end 8.12551 ms, enqueue 2.17996 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.43042 ms - Host latency: 4.81671 ms (end to end 8.06527 ms, enqueue 1.99925 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.42807 ms - Host latency: 4.81323 ms (end to end 7.92094 ms, enqueue 1.83416 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.42715 ms - Host latency: 4.81287 ms (end to end 8.02704 ms, enqueue 1.76178 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.17424 ms - Host latency: 4.55819 ms (end to end 7.41914 ms, enqueue 1.67219 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.14781 ms - Host latency: 4.53007 ms (end to end 7.6062 ms, enqueue 1.59985 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.12979 ms - Host latency: 4.51404 ms (end to end 7.77371 ms, enqueue 1.88416 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.12723 ms - Host latency: 4.51429 ms (end to end 7.834 ms, enqueue 2.11445 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.10267 ms - Host latency: 4.48977 ms (end to end 7.74086 ms, enqueue 2.04337 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08863 ms - Host latency: 4.47354 ms (end to end 7.09607 ms, enqueue 2.18604 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09067 ms - Host latency: 4.47547 ms (end to end 7.47484 ms, enqueue 2.07023 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08401 ms - Host latency: 4.47104 ms (end to end 7.70482 ms, enqueue 2.06426 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.0829 ms - Host latency: 4.46992 ms (end to end 7.9511 ms, enqueue 2.14661 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08402 ms - Host latency: 4.47418 ms (end to end 7.09452 ms, enqueue 2.28236 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08076 ms - Host latency: 4.46663 ms (end to end 7.33628 ms, enqueue 2.08529 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07993 ms - Host latency: 4.46367 ms (end to end 7.64805 ms, enqueue 2.01919 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08186 ms - Host latency: 4.46851 ms (end to end 7.86574 ms, enqueue 1.85051 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.0844 ms - Host latency: 4.47157 ms (end to end 7.96527 ms, enqueue 1.83258 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08369 ms - Host latency: 4.46936 ms (end to end 7.42255 ms, enqueue 1.93965 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08187 ms - Host latency: 4.46757 ms (end to end 7.46995 ms, enqueue 2.15243 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08207 ms - Host latency: 4.46857 ms (end to end 7.39984 ms, enqueue 2.0499 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07881 ms - Host latency: 4.46538 ms (end to end 7.68507 ms, enqueue 1.95341 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08275 ms - Host latency: 4.47366 ms (end to end 7.89254 ms, enqueue 2.1233 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.0871 ms - Host latency: 4.47847 ms (end to end 7.93748 ms, enqueue 2.11808 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08237 ms - Host latency: 4.46838 ms (end to end 7.48705 ms, enqueue 2.0648 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09088 ms - Host latency: 4.47905 ms (end to end 7.21792 ms, enqueue 2.10291 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09344 ms - Host latency: 4.49016 ms (end to end 7.7327 ms, enqueue 2.13967 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08782 ms - Host latency: 4.47954 ms (end to end 7.90985 ms, enqueue 2.0653 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08718 ms - Host latency: 4.47308 ms (end to end 7.9911 ms, enqueue 1.94706 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07974 ms - Host latency: 4.46454 ms (end to end 7.65446 ms, enqueue 1.84956 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.0792 ms - Host latency: 4.46511 ms (end to end 7.48359 ms, enqueue 1.83281 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.0799 ms - Host latency: 4.46553 ms (end to end 7.84554 ms, enqueue 1.967 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08434 ms - Host latency: 4.47159 ms (end to end 7.81114 ms, enqueue 2.18958 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08157 ms - Host latency: 4.46794 ms (end to end 7.98835 ms, enqueue 2.03068 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08042 ms - Host latency: 4.46547 ms (end to end 7.98448 ms, enqueue 1.86172 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08185 ms - Host latency: 4.46481 ms (end to end 7.31252 ms, enqueue 1.70641 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07881 ms - Host latency: 4.46306 ms (end to end 7.66566 ms, enqueue 1.70571 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08167 ms - Host latency: 4.47 ms (end to end 7.83624 ms, enqueue 1.70654 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09543 ms - Host latency: 4.49028 ms (end to end 7.70867 ms, enqueue 1.71919 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09526 ms - Host latency: 4.49038 ms (end to end 7.6366 ms, enqueue 1.67061 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09509 ms - Host latency: 4.49001 ms (end to end 7.49316 ms, enqueue 1.67634 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09336 ms - Host latency: 4.49277 ms (end to end 7.7749 ms, enqueue 1.69236 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09575 ms - Host latency: 4.48899 ms (end to end 7.98733 ms, enqueue 1.62747 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09229 ms - Host latency: 4.4939 ms (end to end 7.74849 ms, enqueue 1.61987 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09482 ms - Host latency: 4.48926 ms (end to end 7.80413 ms, enqueue 1.75386 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09729 ms - Host latency: 4.49399 ms (end to end 7.74124 ms, enqueue 2.25955 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09705 ms - Host latency: 4.49424 ms (end to end 7.9873 ms, enqueue 2.18704 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.09265 ms - Host latency: 4.48613 ms (end to end 7.98384 ms, enqueue 2.049 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08103 ms - Host latency: 4.46963 ms (end to end 7.5092 ms, enqueue 2.01763 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07854 ms - Host latency: 4.46416 ms (end to end 7.63745 ms, enqueue 2.20237 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07869 ms - Host latency: 4.46348 ms (end to end 7.69004 ms, enqueue 2.04788 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07947 ms - Host latency: 4.4645 ms (end to end 7.7304 ms, enqueue 1.94814 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08359 ms - Host latency: 4.47068 ms (end to end 7.92515 ms, enqueue 1.84417 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07607 ms - Host latency: 4.46248 ms (end to end 8.00276 ms, enqueue 2.20955 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07866 ms - Host latency: 4.46501 ms (end to end 7.52761 ms, enqueue 2.25459 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07788 ms - Host latency: 4.4637 ms (end to end 8.01196 ms, enqueue 2.15667 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08003 ms - Host latency: 4.46489 ms (end to end 7.70994 ms, enqueue 2.02893 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07688 ms - Host latency: 4.46316 ms (end to end 7.75447 ms, enqueue 1.93926 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07954 ms - Host latency: 4.46458 ms (end to end 7.90881 ms, enqueue 1.93975 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07825 ms - Host latency: 4.46692 ms (end to end 7.54536 ms, enqueue 2.21606 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.0791 ms - Host latency: 4.46443 ms (end to end 7.63345 ms, enqueue 2.0592 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08218 ms - Host latency: 4.46775 ms (end to end 7.96272 ms, enqueue 1.94109 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07944 ms - Host latency: 4.46541 ms (end to end 7.48708 ms, enqueue 1.87207 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.08196 ms - Host latency: 4.46794 ms (end to end 7.81377 ms, enqueue 1.84001 ms)
[04/30/2021-14:14:59] [I] Average on 10 runs - GPU latency: 4.07993 ms - Host latency: 4.46682 ms (end to end 7.94912 ms, enqueue 2.06599 ms)
[04/30/2021-14:14:59] [I] Host Latency
[04/30/2021-14:14:59] [I] min: 4.44922 ms (end to end 4.5033 ms)
[04/30/2021-14:14:59] [I] max: 4.93512 ms (end to end 8.72623 ms)
[04/30/2021-14:14:59] [I] mean: 4.50229 ms (end to end 7.72822 ms)
[04/30/2021-14:14:59] [I] median: 4.47177 ms (end to end 7.9743 ms)
[04/30/2021-14:14:59] [I] percentile: 4.8248 ms at 99% (end to end 8.65201 ms at 99%)
[04/30/2021-14:14:59] [I] throughput: 242.175 qps
[04/30/2021-14:14:59] [I] walltime: 2.75008 s
[04/30/2021-14:14:59] [I] Enqueue Time
[04/30/2021-14:14:59] [I] min: 1.52295 ms
[04/30/2021-14:14:59] [I] max: 2.68457 ms
[04/30/2021-14:14:59] [I] median: 1.99921 ms
[04/30/2021-14:14:59] [I] GPU Compute
[04/30/2021-14:14:59] [I] min: 4.06836 ms
[04/30/2021-14:14:59] [I] max: 4.55066 ms
[04/30/2021-14:14:59] [I] mean: 4.11436 ms
[04/30/2021-14:14:59] [I] median: 4.08472 ms
[04/30/2021-14:14:59] [I] percentile: 4.43701 ms at 99%
[04/30/2021-14:14:59] [I] total compute time: 2.74016 s
&&&& PASSED TensorRT.trtexec # ./trtexec --loadEngine=./itest_8.trt --batch=1 --verbose

I have run it using trtexec and it has passed.

Hi @364083042,

Sorry for delayed response. We use buffers to copy data to/from GPU memory.
Please refer C++ samples to create inference.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource

Thank you.