I executed the following command as stated in the Hello AI World :
~/jetson-inference/build/aarch64/bin$ python3 imagenet.py images/orange_0.jpg images/test/output_0.jpg
The display output is shown below:
imageNet – loading classification network model from:
– prototxt networks/Googlenet/googlenet.prototxt
– model networks/Googlenet/bvlc_googlenet.caffemodel
– class_labels networks/ilsvrc12_synset_words.txt
– input_blob ‘data’
– output_blob ‘prob’
– batch_size 1
[TRT] TensorRT version 8.2.1
[TRT] loading NVIDIA plugins…
[TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - ::NMS_TRT version 1
[TRT] Registered plugin creator - ::Reorg_TRT version 1
[TRT] Registered plugin creator - ::Region_TRT version 1
[TRT] Registered plugin creator - ::Clip_TRT version 1
[TRT] Registered plugin creator - ::LReLU_TRT version 1
[TRT] Registered plugin creator - ::PriorBox_TRT version 1
[TRT] Registered plugin creator - ::Normalize_TRT version 1
[TRT] Registered plugin creator - ::ScatterND version 1
[TRT] Registered plugin creator - ::RPROI_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - ::CropAndResize version 1
[TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TFTRT_TRT version 1
[TRT] Registered plugin creator - ::Proposal version 1
[TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - ::Split version 1
[TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] [MemUsageChange] Init CUDA: CPU +229, GPU +0, now: CPU 253, GPU 1936 (MiB)
[TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 253 MiB, GPU 1937 MiB
[TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 283 MiB, GPU 1932 MiB
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] found engine cache file networks/Googlenet/bvlc_googlenet.caffemodel.1.1.8201.GPU.FP16.engine
[TRT] found model checksum networks/Googlenet/bvlc_googlenet.caffemodel.sha256sum
[TRT] echo “$(cat networks/Googlenet/bvlc_googlenet.caffemodel.sha256sum) networks/Googlenet/bvlc_googlenet.caffemodel” | sha256sum --check --status
[TRT] model matched checksum networks/Googlenet/bvlc_googlenet.caffemodel.sha256sum
[TRT] loading network plan from engine cache… networks/Googlenet/bvlc_googlenet.caffemodel.1.1.8201.GPU.FP16.engine
[TRT] device GPU, loaded networks/Googlenet/bvlc_googlenet.caffemodel
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 275, GPU 1925 (MiB)
[TRT] Loaded engine size: 20 MiB
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +7, now: CPU 433, GPU 1931 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU -6, now: CPU 673, GPU 1925 (MiB)
[TRT] Deserialization required 23140418 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +20, now: CPU 0, GPU 20 (MiB)
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +6, now: CPU 673, GPU 1924 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 673, GPU 1924 (MiB)
[TRT] Total per-runner device persistent memory is 14754304
[TRT] Total per-runner host persistent memory is 85216
[TRT] Allocated activation device memory of size 3612672
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 37 (MiB)
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] – layers 68
[TRT] – maxBatchSize 1
[TRT] – deviceMemory 3612672
[TRT] – bindings 2
[TRT] binding 0
– index 0
– name ‘data’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3
– dim #1 224
– dim #2 224
[TRT] binding 1
– index 1
– name ‘prob’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1000
– dim #1 1
– dim #2 1
[TRT]
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=224 w=224) size=602112
[TRT] binding to output 0 prob binding index: 1
[TRT] binding to output 0 prob dims (b=1 c=1000 h=1 w=1) size=4000
[TRT]
[TRT] device GPU, networks/Googlenet/bvlc_googlenet.caffemodel initialized.
[TRT] loaded 1000 class labels
[TRT] imageNet – networks/Googlenet/bvlc_googlenet.caffemodel initialized.
[video] created imageLoader from file:///home/magic1/jetson-inference/build/aarch64/bin/images/orange_0.jpg
imageLoader video options:
– URI: file:///home/magic1/jetson-inference/build/aarch64/bin/images/orange_0.jpg
- protocol: file
- location: images/orange_0.jpg
- extension: jpg
– deviceType: file
– ioType: input
– codec: unknown
– codecType: omx
– frameRate: 0
– numBuffers: 4
– zeroCopy: true
– flipMethod: none
– loop: 0
[video] created imageWriter from file:///home/magic1/jetson-inference/build/aarch64/bin/images/test/output_0.jpg
imageWriter video options:
– URI: file:///home/magic1/jetson-inference/build/aarch64/bin/images/test/output_0.jpg
- protocol: file
- location: images/test/output_0.jpg
- extension: jpg
– deviceType: file
– ioType: output
– codec: unknown
– codecType: omx
– frameRate: 0
– bitRate: 0
– numBuffers: 4
– zeroCopy: true
[OpenGL] glDisplay – X screen 0 resolution: 1920x1080
[OpenGL] glDisplay – X window resolution: 1920x1080
[OpenGL] glDisplay – display device initialized (1920x1080)
[video] created glDisplay from display://0
glDisplay video options:
– URI: display://0
- protocol: display
- location: 0
– deviceType: display
– ioType: output
– width: 1920
– height: 1080
– frameRate: 0
– numBuffers: 4
– zeroCopy: true
[image] loaded ‘images/orange_0.jpg’ (1024x683, 3 channels)
imagenet: 96.68% class #950 (orange)
[OpenGL] glDisplay – set the window size to 1024x683
[OpenGL] creating 1024x683 texture (GL_RGB8 format, 2098176 bytes)
[cuda] registered openGL texture for interop access (1024x683, GL_RGB8, 2098176 bytes)
[image] saved ‘images/test/output_0.jpg’ (1024x683, 3 channels)
[TRT] ------------------------------------------------
[TRT] Timing Report networks/Googlenet/bvlc_googlenet.caffemodel
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.24938ms CUDA 0.47557ms
[TRT] Network CPU 3442.24292ms CUDA 3441.75146ms
[TRT] Post-Process CPU 27.18995ms CUDA 27.38057ms
[TRT] Total CPU 3469.68213ms CUDA 3469.60767ms
[TRT] ------------------------------------------------
[TRT] note – when processing a single image, run ‘sudo jetson_clocks’ before
to disable DVFS for more accurate profiling/timing measurements
When I executed ‘sudo jetson_clocks’ command nothing changed. Please assist me
Warmly, Bob