AGX Orin 64GB got torch.compile fails with _thread.RLock on GR00T N1.6

Hi all,

I’m getting started with gr00t n1.6 to run on Jetson AGX Orin 64GB. I tried run benchmark_inference in Pytorch mode by below command:

python scripts/deployment/benchmark_inference.py \
  --model_path weights/GR00T-N1.6-3B \
  --dataset_path demo_data/gr1.PickNPlace \
  --embodiment_tag gr1 \
  --num_iterations 100 \
  --warmup 10 \
  --use_trajectory \
  --skip_compile

And everything is oke, but when i try to run with torch.compile mode by remove option “–skip_compile“ in above command , i get error related to this block code:

# PyTorch mode with torch.compile
policy.model.action_head.model.forward = torch.compile(
    policy.model.action_head.model.forward, mode="max-autotune"
)

Error logs look like below:

Traceback (most recent call last):
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/scripts/deployment/benchmark_inference.py", line 578, in <module>
    main()
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/scripts/deployment/benchmark_inference.py", line 484, in main
    times_components = benchmark_components(
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/scripts/deployment/benchmark_inference.py", line 214, in benchmark_components
    _ = policy.model.action_head.get_action(backbone_outputs, action_inputs)
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/modelopt/project/gr00ttn1.6/gr00t/model/gr00t_n1d6/gr00t_n1d6.py", line 384, in get_action
    return self.get_action_with_features(
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/modelopt/project/gr00ttn1.6/gr00t/model/gr00t_n1d6/gr00t_n1d6.py", line 339, in get_action_with_features
    model_output = self.model(
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 816, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 952, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 936, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1616, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1479, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2310, in compile_to_module
    return self._compile_to_module()
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2320, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2388, in _compile_to_module_lines
    mod = PyCodeCache.load_by_key_path(
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3360, in load_by_key_path
    mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 31, in _reload_python_module
    exec(code, mod.__dict__, mod.__dict__)
  File "/workspace/tmp/torchinductor_modelopt/oe/coevhwlsmqou4psjptsf34qsaaw7nake56l6iu6qjvtt2bxflqgo.py", line 1254, in <module>
    async_compile.wait(globals())
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 573, in wait
    self._wait_futures(scope)
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 593, in _wait_futures
    kernel = result.result()
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4095, in result
    return self.result_fn()
  File "/mnt/modelopt/project/gr00ttn1.6_tcp/.venv/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 452, in get_result
    kernel, elapsed_us = task.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
torch._inductor.exc.InductorError: TypeError: cannot pickle '_thread.RLock' object

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

I tried updating above block code to:

# PyTorch mode with torch.compile
policy.model = torch.compile(
    policy.model, mode="max-autotune"
)

This block can run but the result is same with pytorch mode, not any improvement about performance, my target is the same with NVIDIA report as below (The pytorch mode is the same with report):

Orin PyTorch Eager 6 ms 93 ms 202 ms 300 ms 3.3 Hz
Orin torch.compile 6 ms 93 ms 101 ms 199 ms 5.0 Hz

But actually, i got the same, i think above update is wrong way. Does anyone get the same problem?

This is my dependencies:

absl-py==2.4.0
accelerate==1.2.1
aiosignal==1.4.0
albucore==0.0.17
albumentations==1.4.18
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
asttokens==3.0.1
astunparse==1.6.3
attrs==25.4.0
av==12.3.0
blessings==1.7
certifi==2026.2.25
charset-normalizer==3.4.4
click==8.3.1
cloudpickle==3.1.2
comm==0.2.3
contourpy==1.3.2
cramjam==2.11.0
cycler==0.12.1
debugpy==1.8.20
decorator==5.2.1
decord==0.6.0
diffusers==0.35.0
distro==1.9.0
dm-tree==0.1.8
docker-pycreds==0.4.0
docstring-parser==0.17.0
einops==0.8.2
eval-type-backport==0.3.1
exceptiongroup==1.3.1
executing==2.2.1
farama-notifications==0.0.4
fastparquet==2024.11.0
filelock==3.25.0
flash-attn==2.8.2
flatbuffers==25.12.19
fonttools==4.61.1
frozenlist==1.8.0
fsspec==2026.2.0
gast==0.7.0
gitdb==4.0.12
gitpython==3.1.46
google-pasta==0.2.0
-e file:///mnt/modelopt/project/gr00ttn1.6
grpcio==1.78.0
gymnasium==1.0.0
h5py==3.12.1
hf-xet==1.3.2
huggingface-hub==0.36.2
hydra-core==1.3.2
idna==3.11
imageio==2.34.2
importlib-metadata==8.7.1
iniconfig==2.3.0
iopath==0.1.9
ipykernel==7.2.0
ipython==8.38.0
jedi==0.19.2
jetson-stats==4.3.2
jinja2==3.1.6
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
jupyter-client==8.8.0
jupyter-core==5.9.1
keras==3.12.1
kiwisolver==1.4.9
kornia==0.7.4
kornia-rs==0.1.10
lazy-loader==0.4
libclang==18.1.1
llvmlite==0.46.0
lmdb==1.8.1
mako==1.3.10
markdown==3.10.2
markdown-it-py==4.0.0
markupsafe==3.0.3
matplotlib==3.10.0
matplotlib-inline==0.2.1
mdurl==0.1.2
ml-dtypes==0.4.1
mpmath==1.3.0
msgpack==1.1.2
namex==0.1.0
nest-asyncio==1.6.0
networkx==3.4.2
ninja==1.13.0
numba==0.64.0
numpy==1.26.4
numpydantic==1.6.7
-e file:///home/modelopt/workspace/project/Model-Optimizer
nvtx==0.2.14
omegaconf==2.3.0
onnx==1.18.0
opencv-python==4.11.0.86
opencv-python-headless==4.11.0.86
opt-einsum==3.4.0
optree==0.19.0
packaging==26.0
pandas==2.2.3
parso==0.8.6
peft==0.17.0
pettingzoo==1.25.0
pexpect==4.9.0
pillow==12.1.1
pip==26.0.1
platformdirs==4.9.2
pluggy==1.6.0
portalocker==3.2.0
prompt-toolkit==3.0.52
protobuf==4.25.1
psutil==7.2.2
ptyprocess==0.7.0
pulp==3.3.0
pure-eval==0.2.3
pyarrow==14.0.1
pydantic==2.10.6
pydantic-core==2.27.2
pygments==2.19.2
pyparsing==3.3.2
pytest==9.0.2
python-dateutil==2.9.0.post0
pytools==2025.2.5
pytorch3d @ git+https://github.com/facebookresearch/pytorch3d.git@33824be3cbc87a7dd1db0f6a9a9de9ac81b2d0ba
pytz==2026.1.post1
pyyaml==6.0.2
pyzmq==27.1.0
ray==2.40.0
referencing==0.37.0
regex==2026.2.28
requests==2.32.3
rich==14.3.3
rpds-py==0.30.0
safetensors==0.7.0
scikit-image==0.25.2
scipy==1.15.3
sentry-sdk==2.54.0
setproctitle==1.3.7
setuptools==82.0.0
shtab==1.8.0
siphash24==1.8
six==1.17.0
smbus2==0.6.0
smmap==5.0.2
stack-data==0.6.3
sympy==1.14.0
tensorboard==2.18.0
tensorboard-data-server==0.7.2
tensorflow==2.18.0
tensorflow-io-gcs-filesystem==0.37.1
termcolor==3.3.0
tianshou==0.5.1
tifffile==2025.5.10
timm==1.0.14
tokenizers==0.21.4
tomli==2.4.0
torch==2.8.0
torchvision==0.23.0
tornado==6.5.4
tqdm==4.67.1
traitlets==5.14.3
transformers==4.51.0
triton==3.6.0
typeguard==4.4.2
typing-extensions==4.15.0
tyro==0.9.17
tzdata==2025.3
urllib3==2.6.3
wandb==0.18.0
wcwidth==0.6.0
werkzeug==3.1.6
wheel==0.46.3
wrapt==2.1.1
zipp==3.23.0

Thank you!

You’re running scripts/activate_orin.sh after each Orin logon or ssh and prior to running standalone_inference_script.py ?

https://github.com/NVIDIA/Isaac-GR00T/blob/main/scripts/activate_orin.sh
# this script sets the variables among other things:
    export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
    export CUDA_HOME=/usr/local/cuda
    export CUDA_PATH=/usr/local/cuda

Jetson Orin, not docker, setup.

bash scripts/deployment/orin/install_deps.sh
source .venv/bin/activate
source scripts/activate_orin.sh


I just built docker image with

./build.sh --profile=orin

Started container with

docker run -it --rm --net=host --gpus all --runtime nvidia \
    --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
    -v /home/scott/.cache:/root/.cache \
    -e $HF_TOKEN \
    gr00t-thor:latest

and following ran to completion without error.

python scripts/deployment/standalone_inference_script.py \
    --model-path nvidia/GR00T-N1.6-3B \
    --dataset-path demo_data/gr1.PickNPlace \
    --embodiment-tag GR1 \
    --traj-ids 0 \
    --inference-mode pytorch \
    --denoising-steps 4

Hi,

Do you set up the environment with the link below?

Thanks.

Thank you so much. I tried using docker and it worked without no error!