# ./imagenet.py images/orange_0.jpg images/test/output_0.jpg jetson.inference -- imageNet loading network using argv command line params imageNet -- loading classification network model from: -- prototxt networks/googlenet.prototxt -- model networks/bvlc_googlenet.caffemodel -- class_labels networks/ilsvrc12_synset_words.txt -- input_blob 'data' -- output_blob 'prob' -- batch_size 1 [TRT] TensorRT version 8.0.1 [TRT] loading NVIDIA plugins... [TRT] Registered plugin creator - ::GridAnchor_TRT version 1 [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1 [TRT] Registered plugin creator - ::NMS_TRT version 1 [TRT] Registered plugin creator - ::Reorg_TRT version 1 [TRT] Registered plugin creator - ::Region_TRT version 1 [TRT] Registered plugin creator - ::Clip_TRT version 1 [TRT] Registered plugin creator - ::LReLU_TRT version 1 [TRT] Registered plugin creator - ::PriorBox_TRT version 1 [TRT] Registered plugin creator - ::Normalize_TRT version 1 [TRT] Registered plugin creator - ::ScatterND version 1 [TRT] Registered plugin creator - ::RPROI_TRT version 1 [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1 [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 [TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1 [TRT] Registered plugin creator - ::CropAndResize version 1 [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1 [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1 [TRT] Registered plugin creator - ::Proposal version 1 [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1 [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1 [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1 [TRT] Registered plugin creator - ::Split version 1 [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1 [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1 [TRT] detected model format - caffe (extension '.caffemodel') [TRT] desired precision specified for GPU: FASTEST [TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8 [TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 226, GPU 2686 (MiB) [TRT] native precisions detected for GPU: FP32, FP16 [TRT] selecting fastest native precision for GPU: FP16 [TRT] attempting to open engine cache file networks/bvlc_googlenet.caffemodel.1.1.8001.GPU.FP16.engine [TRT] loading network plan from engine cache... networks/bvlc_googlenet.caffemodel.1.1.8001.GPU.FP16.engine [TRT] device GPU, loaded networks/bvlc_googlenet.caffemodel [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 247, GPU 2727 (MiB) [TRT] Loaded engine size: 20 MB [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 247 MiB, GPU 2727 MiB [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. [TRT] Using cublas a tactic source [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +281, now: CPU 405, GPU 3028 (MiB) [TRT] Using cuDNN as a tactic source [TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU +405, now: CPU 645, GPU 3433 (MiB) [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 645, GPU 3433 (MiB) [TRT] Deserialization required 6391015 microseconds. [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 645 MiB, GPU 3433 MiB [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 645 MiB, GPU 3433 MiB [TRT] Using cublas a tactic source [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 645, GPU 3433 (MiB) [TRT] Using cuDNN as a tactic source [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 646, GPU 3433 (MiB) [TRT] Total per-runner device memory is 14754304 [TRT] Total per-runner host memory is 89824 [TRT] Allocated activation device memory of size 3612672 [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 646 MiB, GPU 3451 MiB [TRT] [TRT] CUDA engine context initialized on device GPU: [TRT] -- layers 68 [TRT] -- maxBatchSize 1 [TRT] -- deviceMemory 3612672 [TRT] -- bindings 2 [TRT] binding 0 -- index 0 -- name 'data' -- type FP32 -- in/out INPUT -- # dims 3 -- dim #0 3 -- dim #1 224 -- dim #2 224 [TRT] binding 1 -- index 1 -- name 'prob' -- type FP32 -- in/out OUTPUT -- # dims 3 -- dim #0 1000 -- dim #1 1 -- dim #2 1 [TRT] [TRT] binding to input 0 data binding index: 0 [TRT] binding to input 0 data dims (b=1 c=3 h=224 w=224) size=602112 [TRT] binding to output 0 prob binding index: 1 [TRT] binding to output 0 prob dims (b=1 c=1000 h=1 w=1) size=4000 [TRT] [TRT] device GPU, networks/bvlc_googlenet.caffemodel initialized. [TRT] imageNet -- loaded 1000 class info entries [TRT] imageNet -- networks/bvlc_googlenet.caffemodel initialized. [video] created imageLoader from file:///jetson-inference/build/aarch64/bin/images/orange_0.jpg ------------------------------------------------ imageLoader video options: ------------------------------------------------ -- URI: file:///jetson-inference/build/aarch64/bin/images/orange_0.jpg - protocol: file - location: images/orange_0.jpg - extension: jpg -- deviceType: file -- ioType: input -- codec: unknown -- width: 0 -- height: 0 -- frameRate: 0.000000 -- bitRate: 0 -- numBuffers: 4 -- zeroCopy: true -- flipMethod: none -- loop: 0 -- rtspLatency 2000 ------------------------------------------------ [video] created imageWriter from file:///jetson-inference/build/aarch64/bin/images/test/output_0.jpg ------------------------------------------------ imageWriter video options: ------------------------------------------------ -- URI: file:///jetson-inference/build/aarch64/bin/images/test/output_0.jpg - protocol: file - location: images/test/output_0.jpg - extension: jpg -- deviceType: file -- ioType: output -- codec: unknown -- width: 0 -- height: 0 -- frameRate: 0.000000 -- bitRate: 0 -- numBuffers: 4 -- zeroCopy: true -- flipMethod: none -- loop: 0 -- rtspLatency 2000 ------------------------------------------------ [OpenGL] failed to open X11 server connection. [OpenGL] failed to create X11 Window. [image] loaded 'images/orange_0.jpg' (1024x683, 3 channels) class 0950 - 0.966797 (orange) [image] saved 'images/test/output_0.jpg' (1024x683, 3 channels) [TRT] ------------------------------------------------ [TRT] Timing Report networks/bvlc_googlenet.caffemodel [TRT] ------------------------------------------------ [TRT] Pre-Process CPU 0.07365ms CUDA 1.76474ms [TRT] Network CPU 122.85263ms CUDA 120.75562ms [TRT] Post-Process CPU 0.86560ms CUDA 0.98312ms [TRT] Total CPU 123.79188ms CUDA 123.50349ms [TRT] ------------------------------------------------ [TRT] note -- when processing a single image, run 'sudo jetson_clocks' before to disable DVFS for more accurate profiling/timing measurements