We are developing an application were boot time is a significant issue as the users need to have our system up and running quickly in a very time constrained environment. For us boot time is from plugin to models loaded and running.
What we have found is that some trivial operations take a lot of time the first time after every boot.
For example (actual values are different)
mean = torch.Tensor([0.5, 0.5, 0.5]).cuda()
takes 11 seconds (on Xavier NX) which is same as the time it takes to load our model with torch2trt (another ~11 seconds).
I suppose this is due to cuda doing something at start. Is there a way to reduce this time? We need our boot time to be below 30 seconds from power on so 11 seconds is way too long.
Also is there a way to reduce the model load time? Specifically:
model_trt = torch2trt.TRTModule() model_trt.load_state_dict(torch.load(optimized_model_path))