Nanoowl example not working

Having problems with nanoowl example jetpack 6. Produces the container ok try to run example this is what I get! obviously doing somthing wrong.!!!
Cheers

jetson-containers run $(autotag nanoowl)
Namespace(packages=[‘nanoowl’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2
– Finding compatible container image for [‘nanoowl’]
dustynv/nanoowl:r36.2.0
[sudo] password for paul:
Sorry, try again.
[sudo] password for paul:
localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist

  • docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/paul/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock dustynv/nanoowl:r36.2.0
    root@ubuntu:/opt/nanoowl# cd examples/tree_demo
    root@ubuntu:/opt/nanoowl/examples/tree_demo# python3 tree_demo.py …/…/data/owl_image_encoder_patch32.engine
    /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
    warnings.warn(
    /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
    warnings.warn(
    /usr/local/lib/python3.10/dist-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/aten/src/ATen/native/TensorShape.cpp:3549.)
    return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
    [05/23/2024-08:47:24] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
    Traceback (most recent call last):
    File “/opt/nanoowl/examples/tree_demo/tree_demo.py”, line 47, in
    predictor = TreePredictor(
    File “/opt/nanoowl/nanoowl/tree_predictor.py”, line 52, in init
    self.clip_predictor = ClipPredictor() if clip_predictor is None else clip_predictor
    File “/opt/nanoowl/nanoowl/clip_predictor.py”, line 65, in init
    self.clip_model, _ = clip.load(model_name, device)
    File “/usr/local/lib/python3.10/dist-packages/clip/clip.py”, line 120, in load
    model_path = _download(_MODELS[name], download_root or os.path.expanduser(“~/.cache/clip”))
    File “/usr/local/lib/python3.10/dist-packages/clip/clip.py”, line 44, in _download
    os.makedirs(root, exist_ok=True)
    File “/usr/lib/python3.10/os.py”, line 225, in makedirs
    mkdir(name, mode)
    FileExistsError: [Errno 17] File exists: ‘/root/.cache/clip’

root@ubuntu:/opt/nanoowl/examples/tree_demo# python3 -m nanoowl.build_image_encoder_engine
data/owl_image_encoder_patch32.engine
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/aten/src/ATen/native/TensorShape.cpp:3549.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/transformers/models/owlvit/modeling_owlvit.py:383: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.10/dist-packages/transformers/models/owlvit/modeling_owlvit.py:426: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=/tmp/tmpg92blwwu/image_encoder.onnx --saveEngine=data/owl_image_encoder_patch32.engine --fp16 --shapes=image:1x3x768x768
[05/23/2024-08:48:33] [I] === Model Options ===
[05/23/2024-08:48:33] [I] Format: ONNX
[05/23/2024-08:48:33] [I] Model: /tmp/tmpg92blwwu/image_encoder.onnx
[05/23/2024-08:48:33] [I] Output:
[05/23/2024-08:48:33] [I] === Build Options ===
[05/23/2024-08:48:33] [I] Max batch: explicit batch
[05/23/2024-08:48:33] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/23/2024-08:48:33] [I] minTiming: 1
[05/23/2024-08:48:33] [I] avgTiming: 8
[05/23/2024-08:48:33] [I] Precision: FP32+FP16
[05/23/2024-08:48:33] [I] LayerPrecisions:
[05/23/2024-08:48:33] [I] Layer Device Types:
[05/23/2024-08:48:33] [I] Calibration:
[05/23/2024-08:48:33] [I] Refit: Disabled
[05/23/2024-08:48:33] [I] Version Compatible: Disabled
[05/23/2024-08:48:33] [I] ONNX Native InstanceNorm: Disabled
[05/23/2024-08:48:33] [I] TensorRT runtime: full
[05/23/2024-08:48:33] [I] Lean DLL Path:
[05/23/2024-08:48:33] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[05/23/2024-08:48:33] [I] Exclude Lean Runtime: Disabled
[05/23/2024-08:48:33] [I] Sparsity: Disabled
[05/23/2024-08:48:33] [I] Safe mode: Disabled
[05/23/2024-08:48:33] [I] Build DLA standalone loadable: Disabled
[05/23/2024-08:48:33] [I] Allow GPU fallback for DLA: Disabled
[05/23/2024-08:48:33] [I] DirectIO mode: Disabled
[05/23/2024-08:48:33] [I] Restricted mode: Disabled
[05/23/2024-08:48:33] [I] Skip inference: Disabled
[05/23/2024-08:48:33] [I] Save engine: data/owl_image_encoder_patch32.engine
[05/23/2024-08:48:33] [I] Load engine:
[05/23/2024-08:48:33] [I] Profiling verbosity: 0
[05/23/2024-08:48:33] [I] Tactic sources: Using default tactic sources
[05/23/2024-08:48:33] [I] timingCacheMode: local
[05/23/2024-08:48:33] [I] timingCacheFile:
[05/23/2024-08:48:33] [I] Heuristic: Disabled
[05/23/2024-08:48:33] [I] Preview Features: Use default preview flags.
[05/23/2024-08:48:33] [I] MaxAuxStreams: -1
[05/23/2024-08:48:33] [I] BuilderOptimizationLevel: -1
[05/23/2024-08:48:33] [I] Input(s)s format: fp32:CHW
[05/23/2024-08:48:33] [I] Output(s)s format: fp32:CHW
[05/23/2024-08:48:33] [I] Input build shape: image=1x3x768x768+1x3x768x768+1x3x768x768
[05/23/2024-08:48:33] [I] Input calibration shapes: model
[05/23/2024-08:48:33] [I] === System Options ===
[05/23/2024-08:48:33] [I] Device: 0
[05/23/2024-08:48:33] [I] DLACore:
[05/23/2024-08:48:33] [I] Plugins:
[05/23/2024-08:48:33] [I] setPluginsToSerialize:
[05/23/2024-08:48:33] [I] dynamicPlugins:
[05/23/2024-08:48:33] [I] ignoreParsedPluginLibs: 0
[05/23/2024-08:48:33] [I]
[05/23/2024-08:48:33] [I] === Inference Options ===
[05/23/2024-08:48:33] [I] Batch: Explicit
[05/23/2024-08:48:33] [I] Input inference shape: image=1x3x768x768
[05/23/2024-08:48:33] [I] Iterations: 10
[05/23/2024-08:48:33] [I] Duration: 3s (+ 200ms warm up)
[05/23/2024-08:48:33] [I] Sleep time: 0ms
[05/23/2024-08:48:33] [I] Idle time: 0ms
[05/23/2024-08:48:33] [I] Inference Streams: 1
[05/23/2024-08:48:33] [I] ExposeDMA: Disabled
[05/23/2024-08:48:33] [I] Data transfers: Enabled
[05/23/2024-08:48:33] [I] Spin-wait: Disabled
[05/23/2024-08:48:33] [I] Multithreading: Disabled
[05/23/2024-08:48:33] [I] CUDA Graph: Disabled
[05/23/2024-08:48:33] [I] Separate profiling: Disabled
[05/23/2024-08:48:33] [I] Time Deserialize: Disabled
[05/23/2024-08:48:33] [I] Time Refit: Disabled
[05/23/2024-08:48:33] [I] NVTX verbosity: 0
[05/23/2024-08:48:33] [I] Persistent Cache Ratio: 0
[05/23/2024-08:48:33] [I] Inputs:
[05/23/2024-08:48:33] [I] === Reporting Options ===
[05/23/2024-08:48:33] [I] Verbose: Disabled
[05/23/2024-08:48:33] [I] Averages: 10 inferences
[05/23/2024-08:48:33] [I] Percentiles: 90,95,99
[05/23/2024-08:48:33] [I] Dump refittable layers:Disabled
[05/23/2024-08:48:33] [I] Dump output: Disabled
[05/23/2024-08:48:33] [I] Profile: Disabled
[05/23/2024-08:48:33] [I] Export timing to JSON file:
[05/23/2024-08:48:33] [I] Export output to JSON file:
[05/23/2024-08:48:33] [I] Export profile to JSON file:
[05/23/2024-08:48:33] [I]
[05/23/2024-08:48:33] [I] === Device Information ===
[05/23/2024-08:48:33] [I] Selected Device: Orin
[05/23/2024-08:48:33] [I] Compute Capability: 8.7
[05/23/2024-08:48:33] [I] SMs: 16
[05/23/2024-08:48:33] [I] Device Global Memory: 62841 MiB
[05/23/2024-08:48:33] [I] Shared Memory per SM: 164 KiB
[05/23/2024-08:48:33] [I] Memory Bus Width: 256 bits (ECC disabled)
[05/23/2024-08:48:33] [I] Application Compute Clock Rate: 1.3 GHz
[05/23/2024-08:48:33] [I] Application Memory Clock Rate: 0.816 GHz
[05/23/2024-08:48:33] [I]
[05/23/2024-08:48:33] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[05/23/2024-08:48:33] [I]
[05/23/2024-08:48:33] [I] TensorRT version: 8.6.2
[05/23/2024-08:48:33] [I] Loading standard plugins
[05/23/2024-08:48:33] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 32, GPU 7307 (MiB)
[05/23/2024-08:48:40] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1432, now: CPU 1222, GPU 8778 (MiB)
[05/23/2024-08:48:40] [I] Start parsing network model.
[05/23/2024-08:48:40] [I] [TRT] ----------------------------------------------------------------
[05/23/2024-08:48:40] [I] [TRT] Input filename: /tmp/tmpg92blwwu/image_encoder.onnx
[05/23/2024-08:48:40] [I] [TRT] ONNX IR version: 0.0.8
[05/23/2024-08:48:40] [I] [TRT] Opset version: 16
[05/23/2024-08:48:40] [I] [TRT] Producer name: pytorch
[05/23/2024-08:48:40] [I] [TRT] Producer version: 2.2.0
[05/23/2024-08:48:40] [I] [TRT] Domain:
[05/23/2024-08:48:40] [I] [TRT] Model version: 0
[05/23/2024-08:48:40] [I] [TRT] Doc string:
[05/23/2024-08:48:40] [I] [TRT] ----------------------------------------------------------------
[05/23/2024-08:48:40] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/23/2024-08:48:40] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[05/23/2024-08:48:40] [I] Finished parsing network model. Parse time: 0.564444
[05/23/2024-08:48:41] [W] [TRT] Detected layernorm nodes in FP16: /vision_model/encoder/layers.0/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.1/layer_norm1/ReduceMean_1, /vision_model/pre_layernorm/Sqrt, /vision_model/encoder/layers.0/layer_norm1/Sqrt, /vision_model/pre_layernorm/ReduceMean_1, /vision_model/encoder/layers.0/layer_norm2/ReduceMean_1, /vision_model/pre_layernorm/Sub, /vision_model/pre_layernorm/Pow, /vision_model/pre_layernorm/Add, /vision_model/pre_layernorm/Div, /vision_model/pre_layernorm/Mul, /vision_model/pre_layernorm/Add_1, /vision_model/encoder/layers.0/layer_norm1/Sub, /vision_model/encoder/layers.0/layer_norm1/Pow, /vision_model/encoder/layers.0/layer_norm1/Add, /vision_model/encoder/layers.0/layer_norm1/Div, /vision_model/encoder/layers.0/layer_norm1/Mul, /vision_model/encoder/layers.0/layer_norm1/Add_1, /vision_model/encoder/layers.0/layer_norm2/Sub, /vision_model/encoder/layers.0/layer_norm2/Pow, /vision_model/encoder/layers.0/layer_norm2/Add, /vision_model/encoder/layers.0/layer_norm2/Sqrt, /vision_model/encoder/layers.0/layer_norm2/Div, /vision_model/encoder/layers.0/layer_norm2/Mul, /vision_model/encoder/layers.0/layer_norm2/Add_1, /vision_model/encoder/layers.1/layer_norm1/Sub, /vision_model/encoder/layers.1/layer_norm1/Pow, /vision_model/encoder/layers.1/layer_norm1/Add, /vision_model/encoder/layers.1/layer_norm1/Sqrt, /vision_model/encoder/layers.1/layer_norm1/Div, /vision_model/encoder/layers.1/layer_norm1/Mul, /vision_model/encoder/layers.1/layer_norm1/Add_1, /vision_model/encoder/layers.1/layer_norm2/Sub, /vision_model/encoder/layers.1/layer_norm2/Pow, /vision_model/encoder/layers.1/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.1/layer_norm2/Add, /vision_model/encoder/layers.1/layer_norm2/Sqrt, /vision_model/encoder/layers.1/layer_norm2/Div, /vision_model/encoder/layers.1/layer_norm2/Mul, /vision_model/encoder/layers.1/layer_norm2/Add_1, /vision_model/encoder/layers.2/layer_norm1/Sub, /vision_model/encoder/layers.2/layer_norm1/Pow, /vision_model/encoder/layers.2/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.2/layer_norm1/Add, /vision_model/encoder/layers.2/layer_norm1/Sqrt, /vision_model/encoder/layers.2/layer_norm1/Div, /vision_model/encoder/layers.2/layer_norm1/Mul, /vision_model/encoder/layers.2/layer_norm1/Add_1, /vision_model/encoder/layers.2/layer_norm2/Sub, /vision_model/encoder/layers.2/layer_norm2/Pow, /vision_model/encoder/layers.2/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.2/layer_norm2/Add, /vision_model/encoder/layers.2/layer_norm2/Sqrt, /vision_model/encoder/layers.2/layer_norm2/Div, /vision_model/encoder/layers.2/layer_norm2/Mul, /vision_model/encoder/layers.2/layer_norm2/Add_1, /vision_model/encoder/layers.3/layer_norm1/Sub, /vision_model/encoder/layers.3/layer_norm1/Pow, /vision_model/encoder/layers.3/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.3/layer_norm1/Add, /vision_model/encoder/layers.3/layer_norm1/Sqrt, /vision_model/encoder/layers.3/layer_norm1/Div, /vision_model/encoder/layers.3/layer_norm1/Mul, /vision_model/encoder/layers.3/layer_norm1/Add_1, /vision_model/encoder/layers.3/layer_norm2/Sub, /vision_model/encoder/layers.3/layer_norm2/Pow, /vision_model/encoder/layers.3/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.3/layer_norm2/Add, /vision_model/encoder/layers.3/layer_norm2/Sqrt, /vision_model/encoder/layers.3/layer_norm2/Div, /vision_model/encoder/layers.3/layer_norm2/Mul, /vision_model/encoder/layers.3/layer_norm2/Add_1, /vision_model/encoder/layers.4/layer_norm1/Sub, /vision_model/encoder/layers.4/layer_norm1/Pow, /vision_model/encoder/layers.4/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.4/layer_norm1/Add, /vision_model/encoder/layers.4/layer_norm1/Sqrt, /vision_model/encoder/layers.4/layer_norm1/Div, /vision_model/encoder/layers.4/layer_norm1/Mul, /vision_model/encoder/layers.4/layer_norm1/Add_1, /vision_model/encoder/layers.4/layer_norm2/Sub, /vision_model/encoder/layers.4/layer_norm2/Pow, /vision_model/encoder/layers.4/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.4/layer_norm2/Add, /vision_model/encoder/layers.4/layer_norm2/Sqrt, /vision_model/encoder/layers.4/layer_norm2/Div, /vision_model/encoder/layers.4/layer_norm2/Mul, /vision_model/encoder/layers.4/layer_norm2/Add_1, /vision_model/encoder/layers.5/layer_norm1/Sub, /vision_model/encoder/layers.5/layer_norm1/Pow, /vision_model/encoder/layers.5/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.5/layer_norm1/Add, /vision_model/encoder/layers.5/layer_norm1/Sqrt, /vision_model/encoder/layers.5/layer_norm1/Div, /vision_model/encoder/layers.5/layer_norm1/Mul, /vision_model/encoder/layers.5/layer_norm1/Add_1, /vision_model/encoder/layers.5/layer_norm2/Sub, /vision_model/encoder/layers.5/layer_norm2/Pow, /vision_model/encoder/layers.5/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.5/layer_norm2/Add, /vision_model/encoder/layers.5/layer_norm2/Sqrt, /vision_model/encoder/layers.5/layer_norm2/Div, /vision_model/encoder/layers.5/layer_norm2/Mul, /vision_model/encoder/layers.5/layer_norm2/Add_1, /vision_model/encoder/layers.6/layer_norm1/Sub, /vision_model/encoder/layers.6/layer_norm1/Pow, /vision_model/encoder/layers.6/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.6/layer_norm1/Add, /vision_model/encoder/layers.6/layer_norm1/Sqrt, /vision_model/encoder/layers.6/layer_norm1/Div, /vision_model/encoder/layers.6/layer_norm1/Mul, /vision_model/encoder/layers.6/layer_norm1/Add_1, /vision_model/encoder/layers.6/layer_norm2/Sub, /vision_model/encoder/layers.6/layer_norm2/Pow, /vision_model/encoder/layers.6/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.6/layer_norm2/Add, /vision_model/encoder/layers.6/layer_norm2/Sqrt, /vision_model/encoder/layers.6/layer_norm2/Div, /vision_model/encoder/layers.6/layer_norm2/Mul, /vision_model/encoder/layers.6/layer_norm2/Add_1, /vision_model/encoder/layers.7/layer_norm1/Sub, /vision_model/encoder/layers.7/layer_norm1/Pow, /vision_model/encoder/layers.7/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.7/layer_norm1/Add, /vision_model/encoder/layers.7/layer_norm1/Sqrt, /vision_model/encoder/layers.7/layer_norm1/Div, /vision_model/encoder/layers.7/layer_norm1/Mul, /vision_model/encoder/layers.7/layer_norm1/Add_1, /vision_model/encoder/layers.7/layer_norm2/Sub, /vision_model/encoder/layers.7/layer_norm2/Pow, /vision_model/encoder/layers.7/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.7/layer_norm2/Add, /vision_model/encoder/layers.7/layer_norm2/Sqrt, /vision_model/encoder/layers.7/layer_norm2/Div, /vision_model/encoder/layers.7/layer_norm2/Mul, /vision_model/encoder/layers.7/layer_norm2/Add_1, /vision_model/encoder/layers.8/layer_norm1/Sub, /vision_model/encoder/layers.8/layer_norm1/Pow, /vision_model/encoder/layers.8/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.8/layer_norm1/Add, /vision_model/encoder/layers.8/layer_norm1/Sqrt, /vision_model/encoder/layers.8/layer_norm1/Div, /vision_model/encoder/layers.8/layer_norm1/Mul, /vision_model/encoder/layers.8/layer_norm1/Add_1, /vision_model/encoder/layers.8/layer_norm2/Sub, /vision_model/encoder/layers.8/layer_norm2/Pow, /vision_model/encoder/layers.8/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.8/layer_norm2/Add, /vision_model/encoder/layers.8/layer_norm2/Sqrt, /vision_model/encoder/layers.8/layer_norm2/Div, /vision_model/encoder/layers.8/layer_norm2/Mul, /vision_model/encoder/layers.8/layer_norm2/Add_1, /vision_model/encoder/layers.9/layer_norm1/Sub, /vision_model/encoder/layers.9/layer_norm1/Pow, /vision_model/encoder/layers.9/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.9/layer_norm1/Add, /vision_model/encoder/layers.9/layer_norm1/Sqrt, /vision_model/encoder/layers.9/layer_norm1/Div, /vision_model/encoder/layers.9/layer_norm1/Mul, /vision_model/encoder/layers.9/layer_norm1/Add_1, /vision_model/encoder/layers.9/layer_norm2/Sub, /vision_model/encoder/layers.9/layer_norm2/Pow, /vision_model/encoder/layers.9/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.9/layer_norm2/Add, /vision_model/encoder/layers.9/layer_norm2/Sqrt, /vision_model/encoder/layers.9/layer_norm2/Div, /vision_model/encoder/layers.9/layer_norm2/Mul, /vision_model/encoder/layers.9/layer_norm2/Add_1, /vision_model/encoder/layers.10/layer_norm1/Sub, /vision_model/encoder/layers.10/layer_norm1/Pow, /vision_model/encoder/layers.10/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.10/layer_norm1/Add, /vision_model/encoder/layers.10/layer_norm1/Sqrt, /vision_model/encoder/layers.10/layer_norm1/Div, /vision_model/encoder/layers.10/layer_norm1/Mul, /vision_model/encoder/layers.10/layer_norm1/Add_1, /vision_model/encoder/layers.10/layer_norm2/Sub, /vision_model/encoder/layers.10/layer_norm2/Pow, /vision_model/encoder/layers.10/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.10/layer_norm2/Add, /vision_model/encoder/layers.10/layer_norm2/Sqrt, /vision_model/encoder/layers.10/layer_norm2/Div, /vision_model/encoder/layers.10/layer_norm2/Mul, /vision_model/encoder/layers.10/layer_norm2/Add_1, /vision_model/encoder/layers.11/layer_norm1/Sub, /vision_model/encoder/layers.11/layer_norm1/Pow, /vision_model/encoder/layers.11/layer_norm1/ReduceMean_1, /vision_model/encoder/layers.11/layer_norm1/Add, /vision_model/encoder/layers.11/layer_norm1/Sqrt, /vision_model/encoder/layers.11/layer_norm1/Div, /vision_model/encoder/layers.11/layer_norm1/Mul, /vision_model/encoder/layers.11/layer_norm1/Add_1, /vision_model/encoder/layers.11/layer_norm2/Sub, /vision_model/encoder/layers.11/layer_norm2/Pow, /vision_model/encoder/layers.11/layer_norm2/ReduceMean_1, /vision_model/encoder/layers.11/layer_norm2/Add, /vision_model/encoder/layers.11/layer_norm2/Sqrt, /vision_model/encoder/layers.11/layer_norm2/Div, /vision_model/encoder/layers.11/layer_norm2/Mul, /vision_model/encoder/layers.11/layer_norm2/Add_1, /post_layernorm/Sub, /post_layernorm/Pow, /post_layernorm/ReduceMean_1, /post_layernorm/Add, /post_layernorm/Sqrt, /post_layernorm/Div, /post_layernorm/Mul, /post_layernorm/Add_1, /layer_norm/Sub, /layer_norm/Pow, /layer_norm/ReduceMean_1, /layer_norm/Add, /layer_norm/Sqrt, /layer_norm/Div, /layer_norm/Mul, /layer_norm/Add_1
[05/23/2024-08:48:41] [W] [TRT] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
[05/23/2024-08:48:41] [I] [TRT] Graph optimization time: 0.0856708 seconds.
[05/23/2024-08:48:41] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/23/2024-08:51:27] [I] [TRT] Detected 1 inputs and 5 output network tensors.
[05/23/2024-08:51:28] [I] [TRT] Total Host Persistent Memory: 5328
[05/23/2024-08:51:28] [I] [TRT] Total Device Persistent Memory: 0
[05/23/2024-08:51:28] [I] [TRT] Total Scratch Memory: 7976448
[05/23/2024-08:51:28] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 170 MiB, GPU 390 MiB
[05/23/2024-08:51:28] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 3 steps to complete.
[05/23/2024-08:51:28] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.030657ms to assign 2 blocks to 3 nodes requiring 8861184 bytes.
[05/23/2024-08:51:28] [I] [TRT] Total Activation Memory: 8861184
[05/23/2024-08:51:28] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/23/2024-08:51:28] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/23/2024-08:51:28] [W] [TRT] Check verbose logs for the list of affected weights.
[05/23/2024-08:51:28] [W] [TRT] - 163 weights are affected by this issue: Detected subnormal FP16 values.
[05/23/2024-08:51:28] [W] [TRT] - 66 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[05/23/2024-08:51:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +4, GPU +256, now: CPU 4, GPU 256 (MiB)
[05/23/2024-08:51:29] [E] Saving engine to file failed.
[05/23/2024-08:51:29] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=/tmp/tmpg92blwwu/image_encoder.onnx --saveEngine=data/owl_image_encoder_patch32.engine --fp16 --shapes=image:1x3x768x768
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/nanoowl/nanoowl/build_image_encoder_engine.py”, line 34, in
predictor.build_image_encoder_engine(
File “/opt/nanoowl/nanoowl/owl_predictor.py”, line 455, in build_image_encoder_engine
return self.load_image_encoder_engine(engine_path, max_batch_size)
File “/opt/nanoowl/nanoowl/owl_predictor.py”, line 386, in load_image_encoder_engine
with open(engine_path, ‘rb’) as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘data/owl_image_encoder_patch32.engine’
root@ubuntu:/opt/nanoowl/examples/tree_demo#

Hi @paulrrh, there are some unusual errors in your logs related to disk I/O, can you check your free disk space?

I just tried running/building this container again here on JetPack 6 (twice), and did not encounter those issues. One thing you might want to try, is deleting your jetson-containers/data/clip directory and running it again. You should not need to do the engine building thing, the actual error is the first one.

Hi dusty_nv
Welcome back ! & thanks for your reply.
I have 830.8 GB /981.8 GB available.
There is no clip directory jetson-containers/data/clip
I did at one time remove clip from /root/.cache/
and retried the above before I passed on my problem.
ps /root/.cache/clip exists
Sorry not much help

Hmm, might be something to do with my PR. can you make sure that jetson-containsers/data/models/clip (edited) directory exists before starting the container? maybe I need to make sure to create it if it doesn’t exist:

OK gotcha - in the nanoowl container, /root/.cache/clip is symbolically linked in the container to/data/clip (which is in turned mounted to your jetson-containers/data/clip directory). When you restart the nanoowl container, that symlink should appear again.

Can you try creating the jetson-containers/data/clip directory again and see if that helps?

sorry, typo. not data/clip, but data/models/clip.

Thanks @tokada1 - I will add a jetson-containers/data/clip/.gitkeep now so that directory is automatically created.

1 Like

again, should be ‘data/models/clip’ (sorry!)

@tokada1 just added this in commit added persistent /data/models/clip directory · dusty-nv/jetson-containers@0a017a3 · GitHub 👍

1 Like

hi dusty_nv & tokada1
Created directory jetson-containers/data/models/clip
and as if by magic it now works.
Thanks to both.
I will say the picture fps is very slow not like the tutorial example.
Cheers

1 Like

@paulrrh noticed that too and I think it’s related to the less-than-optimal transport of the image frames over websockets, instead of using H.264-encoded video stream over WebRTC. The actual model is running in realtime, and I plan to further integrate NanoOWL into the AI agents and add better remote visualization for it.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.