# ./imagenet.py images/orange_0.jpg images/test/output_0.jpg 
jetson.inference -- imageNet loading network using argv command line params

imageNet -- loading classification network model from:
         -- prototxt     networks/googlenet.prototxt
         -- model        networks/bvlc_googlenet.caffemodel
         -- class_labels networks/ilsvrc12_synset_words.txt
         -- input_blob   'data'
         -- output_blob  'prob'
         -- batch_size   1

[TRT]    TensorRT version 8.0.1
[TRT]    loading NVIDIA plugins...
[TRT]    Registered plugin creator - ::GridAnchor_TRT version 1
[TRT]    Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT]    Registered plugin creator - ::NMS_TRT version 1
[TRT]    Registered plugin creator - ::Reorg_TRT version 1
[TRT]    Registered plugin creator - ::Region_TRT version 1
[TRT]    Registered plugin creator - ::Clip_TRT version 1
[TRT]    Registered plugin creator - ::LReLU_TRT version 1
[TRT]    Registered plugin creator - ::PriorBox_TRT version 1
[TRT]    Registered plugin creator - ::Normalize_TRT version 1
[TRT]    Registered plugin creator - ::ScatterND version 1
[TRT]    Registered plugin creator - ::RPROI_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1
[TRT]    Registered plugin creator - ::CropAndResize version 1
[TRT]    Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT]    Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT]    Registered plugin creator - ::Proposal version 1
[TRT]    Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT]    Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT]    Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT]    Registered plugin creator - ::Split version 1
[TRT]    Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT]    Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT]    detected model format - caffe  (extension '.caffemodel')
[TRT]    desired precision specified for GPU: FASTEST
[TRT]    requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]    [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 226, GPU 2686 (MiB)
[TRT]    native precisions detected for GPU:  FP32, FP16
[TRT]    selecting fastest native precision for GPU:  FP16
[TRT]    attempting to open engine cache file networks/bvlc_googlenet.caffemodel.1.1.8001.GPU.FP16.engine
[TRT]    loading network plan from engine cache... networks/bvlc_googlenet.caffemodel.1.1.8001.GPU.FP16.engine
[TRT]    device GPU, loaded networks/bvlc_googlenet.caffemodel
[TRT]    [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 247, GPU 2727 (MiB)
[TRT]    Loaded engine size: 20 MB
[TRT]    [MemUsageSnapshot] deserializeCudaEngine begin: CPU 247 MiB, GPU 2727 MiB
[TRT]    Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[TRT]    Using cublas a tactic source
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +281, now: CPU 405, GPU 3028 (MiB)
[TRT]    Using cuDNN as a tactic source
[TRT]    [MemUsageChange] Init cuDNN: CPU +240, GPU +405, now: CPU 645, GPU 3433 (MiB)
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 645, GPU 3433 (MiB)
[TRT]    Deserialization required 6391015 microseconds.
[TRT]    [MemUsageSnapshot] deserializeCudaEngine end: CPU 645 MiB, GPU 3433 MiB
[TRT]    [MemUsageSnapshot] ExecutionContext creation begin: CPU 645 MiB, GPU 3433 MiB
[TRT]    Using cublas a tactic source
[TRT]    [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 645, GPU 3433 (MiB)
[TRT]    Using cuDNN as a tactic source
[TRT]    [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 646, GPU 3433 (MiB)
[TRT]    Total per-runner device memory is 14754304
[TRT]    Total per-runner host memory is 89824
[TRT]    Allocated activation device memory of size 3612672
[TRT]    [MemUsageSnapshot] ExecutionContext creation end: CPU 646 MiB, GPU 3451 MiB
[TRT]    
[TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       68
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 3612672
[TRT]       -- bindings     2
[TRT]       binding 0
                -- index   0
                -- name    'data'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  3
                -- dim #0  3
                -- dim #1  224
                -- dim #2  224
[TRT]       binding 1
                -- index   1
                -- name    'prob'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  1000
                -- dim #1  1
                -- dim #2  1
[TRT]    
[TRT]    binding to input 0 data  binding index:  0
[TRT]    binding to input 0 data  dims (b=1 c=3 h=224 w=224) size=602112
[TRT]    binding to output 0 prob  binding index:  1
[TRT]    binding to output 0 prob  dims (b=1 c=1000 h=1 w=1) size=4000
[TRT]    
[TRT]    device GPU, networks/bvlc_googlenet.caffemodel initialized.
[TRT]    imageNet -- loaded 1000 class info entries
[TRT]    imageNet -- networks/bvlc_googlenet.caffemodel initialized.
[video]  created imageLoader from file:///jetson-inference/build/aarch64/bin/images/orange_0.jpg
------------------------------------------------
imageLoader video options:
------------------------------------------------
  -- URI: file:///jetson-inference/build/aarch64/bin/images/orange_0.jpg
     - protocol:  file
     - location:  images/orange_0.jpg
     - extension: jpg
  -- deviceType: file
  -- ioType:     input
  -- codec:      unknown
  -- width:      0
  -- height:     0
  -- frameRate:  0.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: none
  -- loop:       0
  -- rtspLatency 2000
------------------------------------------------
[video]  created imageWriter from file:///jetson-inference/build/aarch64/bin/images/test/output_0.jpg
------------------------------------------------
imageWriter video options:
------------------------------------------------
  -- URI: file:///jetson-inference/build/aarch64/bin/images/test/output_0.jpg
     - protocol:  file
     - location:  images/test/output_0.jpg
     - extension: jpg
  -- deviceType: file
  -- ioType:     output
  -- codec:      unknown
  -- width:      0
  -- height:     0
  -- frameRate:  0.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: none
  -- loop:       0
  -- rtspLatency 2000
------------------------------------------------
[OpenGL] failed to open X11 server connection.
[OpenGL] failed to create X11 Window.
[image]  loaded 'images/orange_0.jpg'  (1024x683, 3 channels)
class 0950 - 0.966797  (orange)
[image]  saved 'images/test/output_0.jpg'  (1024x683, 3 channels)

[TRT]    ------------------------------------------------
[TRT]    Timing Report networks/bvlc_googlenet.caffemodel
[TRT]    ------------------------------------------------
[TRT]    Pre-Process   CPU   0.07365ms  CUDA   1.76474ms
[TRT]    Network       CPU 122.85263ms  CUDA 120.75562ms
[TRT]    Post-Process  CPU   0.86560ms  CUDA   0.98312ms
[TRT]    Total         CPU 123.79188ms  CUDA 123.50349ms
[TRT]    ------------------------------------------------

[TRT]    note -- when processing a single image, run 'sudo jetson_clocks' before
                to disable DVFS for more accurate profiling/timing measurements