I am trying to use a set of roughly a dozen NACA airfoil meshes with slightly different parameters to use in the parameterized tessellation example, trying to optimise the parameters. (Example here: https://gitlab.com/nvidia/modulus/examples/-/blob/release_22.09/geometry/parameterized_tesselated_example.py).
I am using an A100 80GB GPU, but as soon as I am trying to constrain it and run the simulation, the GPU goes to >100% and the kernel crashes.
Is there a way to somehow limit the memory requirements and enable training of even larger sets on an 80GB machine?
Thanks!
Hi @benedikt_dietz
Does Modulus crash when its sampling the points of from the STL files or while training? I would start with lowering the number of points your using via the batch_per_epoch
and batch_size
parameters in your constraints.
For some really complicated geometry we have also resorted to pre-sampling the STL files in a separate script with the geometry module and saving them to memory in say a numpy array. Then in the actual training loop, load them from the numpy file. This useful for speeding up testing as well.
Hi, thanks for getting back!
I tried to implement this like the following:
bracket_files = glob.glob("./naca_foils/foils/naca*.stl")
bracket_files.sort()
stl_dict = []
for f in bracket_files:
_temp = f.split('/naca')[-1].split('.')[0]
stl_dict.append({
'path': f,
'angle': int(_temp.split('_')[-1]),
'digit_1': int(_temp.split('_')[0][0]),
'digit_2': int(_temp.split('_')[0][1]),
'digit_3': int(_temp.split('_')[0][2:4]),
})
and then tried to sample from the tessellated geometries like this to do the sampling outside of the actual training loop:
os.makedirs('tessllation_memory/', exist_ok=True)
for i,f in enumerate(stl_dict[:2]):
t = Tessellation.from_stl(f['path'], airtight=True)
points = t.sample_boundary(
nr_points=10,
quasirandom=False,
)
del t
ones = np.reshape(np.ones(len(points['x'])), (-1,1))
points['angle'] = f['angle'] * ones
points['digit_1'] = f['digit_1'] * ones
points['digit_2'] = f['digit_2'] * ones
points['digit_3'] = f['digit_3'] * ones
h5file = h5py.File('tessllation_memory/'+str(i)+'.hdf5', 'w')
for grp_name in points:
print('grp_name'.ljust(40,'.'), grp_name)
dset = h5file.create_dataset(grp_name, data = points[grp_name])
h5file.close()
del points
Unfortunately, this breaks the kernel (almost) every time I run it, even though I’m working on a fairly large GPU and only sampling a very limited number of points on the geometries.
Do you have a suggestion on how to manage this? I believe it’s due to excessive memory utilization, do you think that’s a correct diagnosis of the issue?
Thanks a lot in advance!
Hi @benedikt_dietz
Please try with the updated Modulus container if possible, migration should be very straight forward (guide here). I believe we had a few fixes to pySDF in this recent release.