GPU limitations of foundation pose in jetson orin nano 8gb

Hi everyone again,

Again with my jetson orin nano (8gb) with jp 6.2 and milestone to run foundationpose to gather an object position with an intel realsense 435. So I installed isaac ros succesfully but when I reached the point of foundation pose where i had to convert to onnx (Run launch file title from isaac_ros_foundationpose — isaac_ros_docs documentation ) i had the following message:

admin@JetsonProto-desktop:/workspaces/isaac_ros-dev$ /usr/src/tensorrt/bin/trtexec --onnx=${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationpose/refine_model.onnx --saveEngine=${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:1x160x160x6,input2:1x160x160x6 --maxShapes=input1:42x160x160x6,input2:42x160x160x6
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx --saveEngine=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:1x160x160x6,input2:1x160x160x6 --maxShapes=input1:42x160x160x6,input2:42x160x160x6
[01/22/2026-18:12:46] [I] === Model Options ===
[01/22/2026-18:12:46] [I] Format: ONNX
[01/22/2026-18:12:46] [I] Model: /workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx
[01/22/2026-18:12:46] [I] Output:
[01/22/2026-18:12:46] [I] === Build Options ===
[01/22/2026-18:12:46] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/22/2026-18:12:46] [I] avgTiming: 8
[01/22/2026-18:12:46] [I] Precision: FP32
[01/22/2026-18:12:46] [I] LayerPrecisions:
[01/22/2026-18:12:46] [I] Layer Device Types:
[01/22/2026-18:12:46] [I] Calibration:
[01/22/2026-18:12:46] [I] Refit: Disabled
[01/22/2026-18:12:46] [I] Strip weights: Disabled
[01/22/2026-18:12:46] [I] Version Compatible: Disabled
[01/22/2026-18:12:46] [I] ONNX Plugin InstanceNorm: Disabled
[01/22/2026-18:12:46] [I] TensorRT runtime: full
[01/22/2026-18:12:46] [I] Lean DLL Path:
[01/22/2026-18:12:46] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/22/2026-18:12:46] [I] Exclude Lean Runtime: Disabled
[01/22/2026-18:12:46] [I] Sparsity: Disabled
[01/22/2026-18:12:46] [I] Safe mode: Disabled
[01/22/2026-18:12:46] [I] Build DLA standalone loadable: Disabled
[01/22/2026-18:12:46] [I] Allow GPU fallback for DLA: Disabled
[01/22/2026-18:12:46] [I] DirectIO mode: Disabled
[01/22/2026-18:12:46] [I] Restricted mode: Disabled
[01/22/2026-18:12:46] [I] Skip inference: Disabled
[01/22/2026-18:12:46] [I] Save engine: /workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan
[01/22/2026-18:12:46] [I] Load engine:
[01/22/2026-18:12:46] [I] Profiling verbosity: 0
[01/22/2026-18:12:46] [I] Tactic sources: Using default tactic sources
[01/22/2026-18:12:46] [I] timingCacheMode: local
[01/22/2026-18:12:46] [I] timingCacheFile:
[01/22/2026-18:12:46] [I] Enable Compilation Cache: Enabled
[01/22/2026-18:12:46] [I] errorOnTimingCacheMiss: Disabled
[01/22/2026-18:12:46] [I] Preview Features: Use default preview flags.
[01/22/2026-18:12:46] [I] MaxAuxStreams: -1
[01/22/2026-18:12:46] [I] BuilderOptimizationLevel: -1
[01/22/2026-18:12:46] [I] Calibration Profile Index: 0
[01/22/2026-18:12:46] [I] Weight Streaming: Disabled
[01/22/2026-18:12:46] [I] Runtime Platform: Same As Build
[01/22/2026-18:12:46] [I] Debug Tensors:
[01/22/2026-18:12:46] [I] Input(s)s format: fp32:CHW
[01/22/2026-18:12:46] [I] Output(s)s format: fp32:CHW
[01/22/2026-18:12:46] [I] Input build shape (profile 0): input1=1x160x160x6+1x160x160x6+42x160x160x6
[01/22/2026-18:12:46] [I] Input build shape (profile 0): input2=1x160x160x6+1x160x160x6+42x160x160x6
[01/22/2026-18:12:46] [I] Input calibration shapes: model
[01/22/2026-18:12:46] [I] === System Options ===
[01/22/2026-18:12:46] [I] Device: 0
[01/22/2026-18:12:46] [I] DLACore:
[01/22/2026-18:12:46] [I] Plugins:
[01/22/2026-18:12:46] [I] setPluginsToSerialize:
[01/22/2026-18:12:46] [I] dynamicPlugins:
[01/22/2026-18:12:46] [I] ignoreParsedPluginLibs: 0
[01/22/2026-18:12:46] [I]
[01/22/2026-18:12:46] [I] === Inference Options ===
[01/22/2026-18:12:46] [I] Batch: Explicit
[01/22/2026-18:12:46] [I] Input inference shape : input2=1x160x160x6
[01/22/2026-18:12:46] [I] Input inference shape : input1=1x160x160x6
[01/22/2026-18:12:46] [I] Iterations: 10
[01/22/2026-18:12:46] [I] Duration: 3s (+ 200ms warm up)
[01/22/2026-18:12:46] [I] Sleep time: 0ms
[01/22/2026-18:12:46] [I] Idle time: 0ms
[01/22/2026-18:12:46] [I] Inference Streams: 1
[01/22/2026-18:12:46] [I] ExposeDMA: Disabled
[01/22/2026-18:12:46] [I] Data transfers: Enabled
[01/22/2026-18:12:46] [I] Spin-wait: Disabled
[01/22/2026-18:12:46] [I] Multithreading: Disabled
[01/22/2026-18:12:46] [I] CUDA Graph: Disabled
[01/22/2026-18:12:46] [I] Separate profiling: Disabled
[01/22/2026-18:12:46] [I] Time Deserialize: Disabled
[01/22/2026-18:12:46] [I] Time Refit: Disabled
[01/22/2026-18:12:46] [I] NVTX verbosity: 0
[01/22/2026-18:12:46] [I] Persistent Cache Ratio: 0
[01/22/2026-18:12:46] [I] Optimization Profile Index: 0
[01/22/2026-18:12:46] [I] Weight Streaming Budget: 100.000000%
[01/22/2026-18:12:46] [I] Inputs:
[01/22/2026-18:12:46] [I] Debug Tensor Save Destinations:
[01/22/2026-18:12:46] [I] === Reporting Options ===
[01/22/2026-18:12:46] [I] Verbose: Disabled
[01/22/2026-18:12:46] [I] Averages: 10 inferences
[01/22/2026-18:12:46] [I] Percentiles: 90,95,99
[01/22/2026-18:12:46] [I] Dump refittable layers:Disabled
[01/22/2026-18:12:46] [I] Dump output: Disabled
[01/22/2026-18:12:46] [I] Profile: Disabled
[01/22/2026-18:12:46] [I] Export timing to JSON file:
[01/22/2026-18:12:46] [I] Export output to JSON file:
[01/22/2026-18:12:46] [I] Export profile to JSON file:
[01/22/2026-18:12:46] [I]
[01/22/2026-18:12:46] [I] === Device Information ===
[01/22/2026-18:12:46] [I] Available Devices:
[01/22/2026-18:12:46] [I] Device 0: “Orin” UUID: GPU-dc129584-94ee-5433-bc26-cc14322e046f
[01/22/2026-18:12:46] [I] Selected Device: Orin
[01/22/2026-18:12:46] [I] Selected Device ID: 0
[01/22/2026-18:12:46] [I] Selected Device UUID: GPU-dc129584-94ee-5433-bc26-cc14322e046f
[01/22/2026-18:12:46] [I] Compute Capability: 8.7
[01/22/2026-18:12:46] [I] SMs: 8
[01/22/2026-18:12:46] [I] Device Global Memory: 7619 MiB
[01/22/2026-18:12:46] [I] Shared Memory per SM: 164 KiB
[01/22/2026-18:12:46] [I] Memory Bus Width: 128 bits (ECC disabled)
[01/22/2026-18:12:46] [I] Application Compute Clock Rate: 1.02 GHz
[01/22/2026-18:12:46] [I] Application Memory Clock Rate: 1.02 GHz
[01/22/2026-18:12:46] [I]
[01/22/2026-18:12:46] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/22/2026-18:12:46] [I]
[01/22/2026-18:12:46] [I] TensorRT version: 10.3.0
[01/22/2026-18:12:46] [I] Loading standard plugins
[01/22/2026-18:12:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 2752 (MiB)
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
[01/22/2026-18:12:55] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +1056, now: CPU 1002, GPU 3843 (MiB)
[01/22/2026-18:12:55] [I] Start parsing network model.
[01/22/2026-18:12:56] [I] [TRT] ----------------------------------------------------------------
[01/22/2026-18:12:56] [I] [TRT] Input filename: /workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx
[01/22/2026-18:12:56] [I] [TRT] ONNX IR version: 0.0.8
[01/22/2026-18:12:56] [I] [TRT] Opset version: 17
[01/22/2026-18:12:56] [I] [TRT] Producer name: pytorch
[01/22/2026-18:12:56] [I] [TRT] Producer version: 2.2.0
[01/22/2026-18:12:56] [I] [TRT] Domain:
[01/22/2026-18:12:56] [I] [TRT] Model version: 0
[01/22/2026-18:12:56] [I] [TRT] Doc string:
[01/22/2026-18:12:56] [I] [TRT] ----------------------------------------------------------------
[01/22/2026-18:12:56] [I] Finished parsing network model. Parse time: 1.02339
[01/22/2026-18:12:56] [I] Set shape of input tensor input1 for optimization profile 0 to: MIN=1x160x160x6 OPT=1x160x160x6 MAX=42x160x160x6
[01/22/2026-18:12:56] [I] Set shape of input tensor input2 for optimization profile 0 to: MIN=1x160x160x6 OPT=1x160x160x6 MAX=42x160x160x6
[01/22/2026-18:12:56] [W] [TRT] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[01/22/2026-18:12:56] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
[01/22/2026-18:12:56] [E] Error[1]: [resizingAllocator.cpp::allocate::75] Error Code 1: Cuda Runtime (out of memory)
[01/22/2026-18:12:56] [W] [TRT] Requested amount of GPU memory (77414400 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[01/22/2026-18:12:56] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 77414400 detected for tactic 0x0000000000000000.
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
[01/22/2026-18:12:57] [E] Error[1]: [resizingAllocator.cpp::allocate::75] Error Code 1: Cuda Runtime (out of memory)
[01/22/2026-18:12:57] [W] [TRT] Requested amount of GPU memory (301056000 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[01/22/2026-18:12:57] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 301056000 detected for tactic 0x0000000000000000.
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
[01/22/2026-18:12:57] [E] Error[1]: [resizingAllocator.cpp::allocate::75] Error Code 1: Cuda Runtime (out of memory)
[01/22/2026-18:12:57] [W] [TRT] Requested amount of GPU memory (301056000 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[01/22/2026-18:12:57] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 1 due to insufficient memory on requested size of 301056000 detected for tactic 0x0000000000000001.
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
[01/22/2026-18:14:01] [W] [TRT] Tactic Device request: 1000MB Available: 470MB. Device memory is insufficient to use tactic.
[01/22/2026-18:14:01] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 1049396224 detected for tactic 0x0000000000000000.
[01/22/2026-18:14:01] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/rot_head/rot_head.0/self_attn/Slice_2_output_0[Constant]…/ReduceMean]}.)
[01/22/2026-18:14:01] [E] Engine could not be created from network
[01/22/2026-18:14:01] [E] Building engine failed
[01/22/2026-18:14:01] [E] Failed to create engine from model or file.
[01/22/2026-18:14:01] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx --saveEngine=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:1x160x160x6,input2:1x160x160x6 --maxShapes=input1:42x160x160x6,input2:42x160x160x6
admin@JetsonProto-desktop:/workspaces/isaac_ros-dev$ /usr/src/tensorrt/bin/trtexec --onnx=${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationpose/refine_model.onnx --saveEngine=${ISAAC_ROS_WS}/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:1x160x160x6,input2:1x160x160x6 --maxShapes=input1:42x160x160x6,input2:42x160x160x6
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx --saveEngine=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:1x160x160x6,input2:1x160x160x6 --maxShapes=input1:42x160x160x6,input2:42x160x160x6
[01/22/2026-18:17:21] [I] === Model Options ===
[01/22/2026-18:17:21] [I] Format: ONNX
[01/22/2026-18:17:21] [I] Model: /workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx
[01/22/2026-18:17:21] [I] Output:
[01/22/2026-18:17:21] [I] === Build Options ===
[01/22/2026-18:17:21] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/22/2026-18:17:21] [I] avgTiming: 8
[01/22/2026-18:17:21] [I] Precision: FP32
[01/22/2026-18:17:21] [I] LayerPrecisions:
[01/22/2026-18:17:21] [I] Layer Device Types:
[01/22/2026-18:17:21] [I] Calibration:
[01/22/2026-18:17:21] [I] Refit: Disabled
[01/22/2026-18:17:21] [I] Strip weights: Disabled
[01/22/2026-18:17:21] [I] Version Compatible: Disabled
[01/22/2026-18:17:21] [I] ONNX Plugin InstanceNorm: Disabled
[01/22/2026-18:17:21] [I] TensorRT runtime: full
[01/22/2026-18:17:21] [I] Lean DLL Path:
[01/22/2026-18:17:21] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/22/2026-18:17:21] [I] Exclude Lean Runtime: Disabled
[01/22/2026-18:17:21] [I] Sparsity: Disabled
[01/22/2026-18:17:21] [I] Safe mode: Disabled
[01/22/2026-18:17:21] [I] Build DLA standalone loadable: Disabled
[01/22/2026-18:17:21] [I] Allow GPU fallback for DLA: Disabled
[01/22/2026-18:17:21] [I] DirectIO mode: Disabled
[01/22/2026-18:17:21] [I] Restricted mode: Disabled
[01/22/2026-18:17:21] [I] Skip inference: Disabled
[01/22/2026-18:17:21] [I] Save engine: /workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan
[01/22/2026-18:17:21] [I] Load engine:
[01/22/2026-18:17:21] [I] Profiling verbosity: 0
[01/22/2026-18:17:21] [I] Tactic sources: Using default tactic sources
[01/22/2026-18:17:21] [I] timingCacheMode: local
[01/22/2026-18:17:21] [I] timingCacheFile:
[01/22/2026-18:17:21] [I] Enable Compilation Cache: Enabled
[01/22/2026-18:17:21] [I] errorOnTimingCacheMiss: Disabled
[01/22/2026-18:17:21] [I] Preview Features: Use default preview flags.
[01/22/2026-18:17:21] [I] MaxAuxStreams: -1
[01/22/2026-18:17:21] [I] BuilderOptimizationLevel: -1
[01/22/2026-18:17:21] [I] Calibration Profile Index: 0
[01/22/2026-18:17:21] [I] Weight Streaming: Disabled
[01/22/2026-18:17:21] [I] Runtime Platform: Same As Build
[01/22/2026-18:17:21] [I] Debug Tensors:
[01/22/2026-18:17:21] [I] Input(s)s format: fp32:CHW
[01/22/2026-18:17:21] [I] Output(s)s format: fp32:CHW
[01/22/2026-18:17:21] [I] Input build shape (profile 0): input1=1x160x160x6+1x160x160x6+42x160x160x6
[01/22/2026-18:17:21] [I] Input build shape (profile 0): input2=1x160x160x6+1x160x160x6+42x160x160x6
[01/22/2026-18:17:21] [I] Input calibration shapes: model
[01/22/2026-18:17:21] [I] === System Options ===
[01/22/2026-18:17:21] [I] Device: 0
[01/22/2026-18:17:21] [I] DLACore:
[01/22/2026-18:17:21] [I] Plugins:
[01/22/2026-18:17:21] [I] setPluginsToSerialize:
[01/22/2026-18:17:21] [I] dynamicPlugins:
[01/22/2026-18:17:21] [I] ignoreParsedPluginLibs: 0
[01/22/2026-18:17:21] [I]
[01/22/2026-18:17:21] [I] === Inference Options ===
[01/22/2026-18:17:21] [I] Batch: Explicit
[01/22/2026-18:17:21] [I] Input inference shape : input2=1x160x160x6
[01/22/2026-18:17:21] [I] Input inference shape : input1=1x160x160x6
[01/22/2026-18:17:21] [I] Iterations: 10
[01/22/2026-18:17:21] [I] Duration: 3s (+ 200ms warm up)
[01/22/2026-18:17:21] [I] Sleep time: 0ms
[01/22/2026-18:17:21] [I] Idle time: 0ms
[01/22/2026-18:17:21] [I] Inference Streams: 1
[01/22/2026-18:17:21] [I] ExposeDMA: Disabled
[01/22/2026-18:17:21] [I] Data transfers: Enabled
[01/22/2026-18:17:21] [I] Spin-wait: Disabled
[01/22/2026-18:17:21] [I] Multithreading: Disabled
[01/22/2026-18:17:21] [I] CUDA Graph: Disabled
[01/22/2026-18:17:21] [I] Separate profiling: Disabled
[01/22/2026-18:17:21] [I] Time Deserialize: Disabled
[01/22/2026-18:17:21] [I] Time Refit: Disabled
[01/22/2026-18:17:21] [I] NVTX verbosity: 0
[01/22/2026-18:17:21] [I] Persistent Cache Ratio: 0
[01/22/2026-18:17:21] [I] Optimization Profile Index: 0
[01/22/2026-18:17:21] [I] Weight Streaming Budget: 100.000000%
[01/22/2026-18:17:21] [I] Inputs:
[01/22/2026-18:17:21] [I] Debug Tensor Save Destinations:
[01/22/2026-18:17:21] [I] === Reporting Options ===
[01/22/2026-18:17:21] [I] Verbose: Disabled
[01/22/2026-18:17:21] [I] Averages: 10 inferences
[01/22/2026-18:17:21] [I] Percentiles: 90,95,99
[01/22/2026-18:17:21] [I] Dump refittable layers:Disabled
[01/22/2026-18:17:21] [I] Dump output: Disabled
[01/22/2026-18:17:21] [I] Profile: Disabled
[01/22/2026-18:17:21] [I] Export timing to JSON file:
[01/22/2026-18:17:21] [I] Export output to JSON file:
[01/22/2026-18:17:21] [I] Export profile to JSON file:
[01/22/2026-18:17:21] [I]
[01/22/2026-18:17:21] [I] === Device Information ===
[01/22/2026-18:17:21] [I] Available Devices:
[01/22/2026-18:17:21] [I] Device 0: “Orin” UUID: GPU-dc129584-94ee-5433-bc26-cc14322e046f
[01/22/2026-18:17:22] [I] Selected Device: Orin
[01/22/2026-18:17:22] [I] Selected Device ID: 0
[01/22/2026-18:17:22] [I] Selected Device UUID: GPU-dc129584-94ee-5433-bc26-cc14322e046f
[01/22/2026-18:17:22] [I] Compute Capability: 8.7
[01/22/2026-18:17:22] [I] SMs: 8
[01/22/2026-18:17:22] [I] Device Global Memory: 7619 MiB
[01/22/2026-18:17:22] [I] Shared Memory per SM: 164 KiB
[01/22/2026-18:17:22] [I] Memory Bus Width: 128 bits (ECC disabled)
[01/22/2026-18:17:22] [I] Application Compute Clock Rate: 1.02 GHz
[01/22/2026-18:17:22] [I] Application Memory Clock Rate: 1.02 GHz
[01/22/2026-18:17:22] [I]
[01/22/2026-18:17:22] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/22/2026-18:17:22] [I]
[01/22/2026-18:17:22] [I] TensorRT version: 10.3.0
[01/22/2026-18:17:22] [I] Loading standard plugins
[01/22/2026-18:17:22] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 2823 (MiB)
[01/22/2026-18:17:24] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +750, now: CPU 1002, GPU 3618 (MiB)
[01/22/2026-18:17:24] [I] Start parsing network model.
[01/22/2026-18:17:24] [I] [TRT] ----------------------------------------------------------------
[01/22/2026-18:17:24] [I] [TRT] Input filename: /workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx
[01/22/2026-18:17:24] [I] [TRT] ONNX IR version: 0.0.8
[01/22/2026-18:17:24] [I] [TRT] Opset version: 17
[01/22/2026-18:17:24] [I] [TRT] Producer name: pytorch
[01/22/2026-18:17:24] [I] [TRT] Producer version: 2.2.0
[01/22/2026-18:17:24] [I] [TRT] Domain:
[01/22/2026-18:17:24] [I] [TRT] Model version: 0
[01/22/2026-18:17:24] [I] [TRT] Doc string:
[01/22/2026-18:17:24] [I] [TRT] ----------------------------------------------------------------
[01/22/2026-18:17:24] [I] Finished parsing network model. Parse time: 0.115656
[01/22/2026-18:17:24] [I] Set shape of input tensor input1 for optimization profile 0 to: MIN=1x160x160x6 OPT=1x160x160x6 MAX=42x160x160x6
[01/22/2026-18:17:24] [I] Set shape of input tensor input2 for optimization profile 0 to: MIN=1x160x160x6 OPT=1x160x160x6 MAX=42x160x160x6
[01/22/2026-18:17:24] [W] [TRT] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[01/22/2026-18:17:24] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/22/2026-18:18:25] [W] [TRT] Tactic Device request: 1000MB Available: 837MB. Device memory is insufficient to use tactic.
[01/22/2026-18:18:25] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 1049396224 detected for tactic 0x0000000000000000.
[01/22/2026-18:18:25] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/rot_head/rot_head.0/self_attn/Slice_2_output_0[Constant]…/ReduceMean]}.)
[01/22/2026-18:18:25] [E] Engine could not be created from network
[01/22/2026-18:18:25] [E] Building engine failed
[01/22/2026-18:18:25] [E] Failed to create engine from model or file.
[01/22/2026-18:18:25] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_model.onnx --saveEngine=/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_trt_engine.plan --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:1x160x160x6,input2:1x160x160x6 --maxShapes=input1:42x160x160x6,input2:42x160x160x6

From a non-expert pov guy like me, it seems like 8gb of gpu are not enough. Thus, my doubts are:

  1. Based on release 3.2 (pre thor era), what’s a suitable gpu for these models?
  2. Nevertheless, if I want anyway to run foundation pose on my orin nano with 8gb, what parameters can I tune such that I could manage to make it work even with low quality?

Thanks!

Hello @cm.napole,

Thanks for posting in the Isaac ROS forum!

For Jetson Orin Nano 8 GB, consider using CenterPose (isaac_ros_centerpose) or DOPE (isaac_ros_dope) for object pose estimation if runtime performance is critical and you have offline training resources. Both are supported in Isaac ROS 3.2 and are lighter than FoundationPose, which requires a substantial 7.5 GB GPU margin. FoundationPose is best reserved for more capable Jetsons (AGX Orin) or discrete GPUs, as noted in the documentation, due to its higher compute requirements for initial detection.

I’ve been using a jetson Orin nano super and ran into the same issue for foundationpose converting the score model. Is there any way to run foundationpose while using a smaller batch size to convert the score model? Or how can you make foundationpose work on the jetson Orin nano super as that is my goal. Also I’ve had success converting the model running the device headless outside the docker container, will this work? I know that the tutorial says to convert the model in the container.

Hello @jackdugan02,

Welcome to the Isaac ROS forum and thanks for the post!

Our official examples use maxShapes up to 252 for the score network. Reducing this may help with memory on Orin Nano, but you might still hit runtime shape errors if FoundationPose requests a larger batch.

Conceptually, conversion is just an offline TensorRT engine build step. As long as you build the engine on the same Jetson (same GPU architecture) and the TensorRT/CUDA versions match what the Isaac ROS container is using, converting outside the Docker container should be fine.
You can convert it outside the container and then pass that .plan file into the container via score_engine_file_path / refine_engine_file_path.