Hi,
I trained a model using Modulus 22.09 to predict the flow field for a 2D airfoil with varying angle of attack and inlet velocity. I am trying to calculate the error between the model’s predictions and the validation data, specifically the error in u, v, and p. To do this, I created a PointwiseMonitor to calculate the desired error values.
However, I encountered a CUDA out of memory error as the following:
RuntimeError: CUDA out of memory. Tried to allocate 218.00 MiB (GPU 0; 31.74 GiB total capacity; 30.33 GiB already allocated; 183.38 MiB free; 30.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The PointwiseMonitor is implemented as the following:
#openfoam_var is the validation data
temp1 = {
key: value
for key, value in openfoam_var.items()
if key in ["x", "y","aoa","vel_in","u", "v", "p"]
}
error_uvp = PointwiseMonitor(
invar = temp1,
output_names=["u__x", "u__y", "v__x", "v__y","p"],
metrics={
"error_u"+str(num_aoa)+"_"+str(vel_in):
lambda var: torch.mean(torch.sqrt(var["u__x"]**2+var["u__y"]**2)-var["u_op"]),
"error_v"+str(num_aoa)+"_"+str(vel_in):
lambda var: torch.mean(torch.sqrt(var["v__x"]**2+var["v__y"]**2)-var["v_op"]),
"error_p"+str(num_aoa)+"_"+str(vel_in):
lambda var: torch.mean(var["p"]-var["p_op"]),
},
nodes=flow_nodes,
)
domain.add_monitor(error_uvp)
I suspect that this section of the code may be the cause. I am currently investigating the issue and am unsure why it is happening. I was wondering if there is a better way to calculate the error between the model output and target values, and would appreciate any help in advance.
Thank you.