We encountered an occasional problem. When using the service on thor to infer VLA model on the same data, different output results will appear in some time. After locating it, we found that the first time entering the service node, there is a probability that the loaded model parameters will change, just like the memory storing the parameters is overwritten. We call the service node as follows.
def run(self):
self.app = FastAPI()
self.app.post("/act")(self.predict_action)
uvicorn.run(self.app, host=self.args.host, port=self.args.port)
At the same time, we found that this problem will not occur if fastapi is not used. This problem will also not occur when using fastapi on orin. The base environment uses the officially recommended version. We used docker nvcr.io/nvidia/pytorch:25.08-py3
.
Reloading the model weights when calling fastapi for the first time can temporarily circumvent the problem. But we want to know what might be the reason for overwriting parameters when calling fastapi? We will provide the necessary information if required.