The VLA model weights occasionally overwriting when calling services in thor

yiming.li2 · September 17, 2025, 8:40am

We encountered an occasional problem. When using the service on thor to infer VLA model on the same data, different output results will appear in some time. After locating it, we found that the first time entering the service node, there is a probability that the loaded model parameters will change, just like the memory storing the parameters is overwritten. We call the service node as follows.

    def run(self):
        self.app = FastAPI()
        self.app.post("/act")(self.predict_action)
        uvicorn.run(self.app, host=self.args.host, port=self.args.port)

At the same time, we found that this problem will not occur if fastapi is not used. This problem will also not occur when using fastapi on orin. The base environment uses the officially recommended version. We used docker nvcr.io/nvidia/pytorch:25.08-py3.

Reloading the model weights when calling fastapi for the first time can temporarily circumvent the problem. But we want to know what might be the reason for overwriting parameters when calling fastapi? We will provide the necessary information if required.

AastaLLL · September 18, 2025, 2:37am

Hi,

Have you tried the same steps on other devices, like an x86 machine?
This sounds like a problem from fastapi.

Without using fastapi, does the same issue occurs on the Orin?

Thanks.

yiming.li2 · September 18, 2025, 2:58am

We suspected the problem of fastapi, but it works well on orin, whether or not fastapi is used.

AastaLLL · September 19, 2025, 5:55am

Hi,

Do you use the same fastapi version on Thor?
If not, could you give it a try?

Thanks.

yiming.li2 · September 19, 2025, 6:14am

we change the same version and get the same results. We have done a lot of experiments and this problem only occurs when using fastapi on thor.

Topic		Replies	Views
JetPack 7.0/Jetson Linux 38.2 for NVIDIA Jetson Thor is now live Jetson Thor cudnn , llama	17	1089	September 18, 2025
Performance Comparison of Qwen3-30B-A3B-AWQ on Jetson Thor vs Orin AGX 64GB Jetson Thor generative_ai	4	36	September 19, 2025
Performance Gap Between Self-Hosted VILA Model and NVIDIA VILA API - Need Parameter Configuration Guidance Visual AI Agent cosmos	5	86	September 16, 2025
Triton Inference Server + vLLM Backend on the NVIDIA Jetson AGX Orin 64GB Developer Kit Jetson Projects generative_ai	9	746	June 16, 2025
Can't loading "TheBloke_llava-v1.5-13B-GPTQ" with AGXorin 32GB Jetson AGX Orin generative_ai	9	193	September 10, 2024
Cannot run LLaVa with Orin NX Jetson Orin NX generative_ai	7	387	August 1, 2024
Jetson thor: run qwen2.5vl by ollama can't on GPU, only cpu Jetson Thor generative_ai	6	108	September 10, 2025
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	24823	May 10, 2024
Live Llava on Orin Jetson Projects generative_ai	20	2328	March 13, 2025
I want to try LLaVa with Jetson Orin Jetson AGX Orin generative_ai	5	1032	March 10, 2024

The VLA model weights occasionally overwriting when calling services in thor

Related topics