I am trying to use a set of roughly a dozen NACA airfoil meshes with slightly different parameters to use in the parameterized tessellation example, trying to optimise the parameters. (Example here: https://gitlab.com/nvidia/modulus/examples/-/blob/release_22.09/geometry/parameterized_tesselated_example.py).
I am using an A100 80GB GPU, but as soon as I am trying to constrain it and run the simulation, the GPU goes to >100% and the kernel crashes.
Is there a way to somehow limit the memory requirements and enable training of even larger sets on an 80GB machine?
Does Modulus crash when its sampling the points of from the STL files or while training? I would start with lowering the number of points your using via the
batch_size parameters in your constraints.
For some really complicated geometry we have also resorted to pre-sampling the STL files in a separate script with the geometry module and saving them to memory in say a numpy array. Then in the actual training loop, load them from the numpy file. This useful for speeding up testing as well.
Hi, thanks for getting back!
I tried to implement this like the following:
bracket_files = glob.glob("./naca_foils/foils/naca*.stl")
stl_dict = 
for f in bracket_files:
_temp = f.split('/naca')[-1].split('.')
and then tried to sample from the tessellated geometries like this to do the sampling outside of the actual training loop:
for i,f in enumerate(stl_dict[:2]):
t = Tessellation.from_stl(f['path'], airtight=True)
points = t.sample_boundary(
ones = np.reshape(np.ones(len(points['x'])), (-1,1))
points['angle'] = f['angle'] * ones
points['digit_1'] = f['digit_1'] * ones
points['digit_2'] = f['digit_2'] * ones
points['digit_3'] = f['digit_3'] * ones
h5file = h5py.File('tessllation_memory/'+str(i)+'.hdf5', 'w')
for grp_name in points:
dset = h5file.create_dataset(grp_name, data = points[grp_name])
Unfortunately, this breaks the kernel (almost) every time I run it, even though I’m working on a fairly large GPU and only sampling a very limited number of points on the geometries.
Do you have a suggestion on how to manage this? I believe it’s due to excessive memory utilization, do you think that’s a correct diagnosis of the issue?
Thanks a lot in advance!
Please try with the updated Modulus container if possible, migration should be very straight forward (guide here). I believe we had a few fixes to pySDF in this recent release.