When running the GPUMD software, specifically, the examples as described here, I get a segmentation fault when running the second of the five examples. What is perplexing is that the error occurs at a point when no calculations are being performed, in fact the log file indicates that the error occurs during the reading of a simple text file (xyz.in):
(base) xenohunter@DESKTOP-0823CKN:~/GPUMD$ tail -n20 log.gpumd_examples
---------------------------------------------------------------
Time used for 'examples/gpumd/thermal_expansion' = 94.680956 s.
---------------------------------------------------------------
---------------------------------------------------------------
Run simulation for 'examples/gpumd/density_of_states'.
---------------------------------------------------------------
---------------------------------------------------------------
Started initializing positions and related parameters.
---------------------------------------------------------------
Number of atoms is 8640.
Maximum number of neighbors is 3.
Initial cutoff for neighbor list is 2.1 A.
Use orthogonal box.
Do not specify initial velocities her
According to the GPUMD source files, this would mean that the segmentation fault occurs when running the read_xyz_in_line_1 function in read_xyz.cu. Specifically, it occurs at middle of a simple print statement. Therefore, it seems likely that the problem is memory related and, possibly, due to a bug in CUDA for WSL 2. Since I do not have any experience in developing software based on CUDA, any tips on how to solve, or cirvumvent, this issue would be greatly appreciated.
I have discovered something very interesting. In the case described above I run the five examples in series, but if I instead run the second example, during which the segmentation fault occurs, alone then several additional steps are performed before the segmentation fault occurs, in fact it appears during the post processing:
(base) xenohunter@DESKTOP-0823CKN:~/GPUMD$ tail -n70 log.input_density_of_states_gpumd
---------------------------------------------------------------
Run simulation for 'examples/gpumd/density_of_states'.
---------------------------------------------------------------
---------------------------------------------------------------
Started initializing positions and related parameters.
---------------------------------------------------------------
Number of atoms is 8640.
Maximum number of neighbors is 3.
Initial cutoff for neighbor list is 2.1 A.
Use orthogonal box.
Do not specify initial velocities here.
Have no grouping method.
Box lengths are
Lx = 1.4964900000e+02 A
Ly = 1.5552000000e+02 A
Lz = 3.3500000000e+00 A
Use periodic boundary conditions along x.
Use periodic boundary conditions along y.
Use free boundary conditions along z.
There is only one atom type.
8640 atoms of type 0.
---------------------------------------------------------------
Finished initializing positions and related parameters.
---------------------------------------------------------------
---------------------------------------------------------------
Started executing the commands in run.in.
---------------------------------------------------------------
Use Tersoff-1989 (single-element) potential.
applies to atoms [0, 8640) from type 0 to type 0.
Initialized velocities with T = 300 K.
Use NPT ensemble for this run.
choose the Berendsen method.
initial temperature is 300 K.
final temperature is 300 K.
T_coupling is 0.01.
pressure_x is 0 GPa.
pressure_y is 0 GPa.
pressure_z is 0 GPa.
p_coupling is 0.0005.
Time step for this run is 1 fs.
Dump thermo every 100 steps.
Run 200000 steps.
20000 steps completed.
40000 steps completed.
60000 steps completed.
80000 steps completed.
100000 steps completed.
120000 steps completed.
140000 steps completed.
160000 steps completed.
180000 steps completed.
200000 steps completed.
---------------------------------------------------------------
Number of neighbor list updates = 0.
Time used for this run = 73.278 s.
Speed of this run = 2.35814e+07 atom*step/second.
---------------------------------------------------------------
Use NVE ensemble for this run.
Compute phonon DOS.
sa
Consequently, it seems like the segmentation fault occurs after the code has been running a certain time regardless of what calculations (actions) are actually being performed, which is an even stronger indication that this is some kind of bug.
Since cuda-memcheck indicates that the problem is due to the host, I have debugged the software using valgrind and obtained the output found in this log file: valgrind.log (1.3 MB). Since I have no previous experience in using valgrind I am not quite sure how to interpret the results. From what I can gather the segmentation fault is due to a “bad permission”:
==799== Process terminating with default action of signal 11 (SIGSEGV)
==799== Bad permissions for mapped region at address 0x302DF0000
==799== at 0x16A4AD: DOS::postprocess(char const*) (in /home/xenohunter/GPUMD/src/gpumd)
==799== by 0x16CEB9: Measure::finalize(char*, int, double, double, double) (in /home/xenohunter/GPUMD/src/gpumd)
==799== by 0x11DB52: Run::perform_a_run(char*) (in /home/xenohunter/GPUMD/src/gpumd)
==799== by 0x11EBED: Run::execute_run_in(char*) (in /home/xenohunter/GPUMD/src/gpumd)
==799== by 0x1201C7: Run::Run(char*) (in /home/xenohunter/GPUMD/src/gpumd)
==799== by 0x11384C: main (in /home/xenohunter/GPUMD/src/gpumd)
Also, there is evidence of memory leakage:
LEAK SUMMARY:
==799== definitely lost: 3,456 bytes in 5 blocks
==799== indirectly lost: 0 bytes in 0 blocks
==799== possibly lost: 34,980 bytes in 512 blocks
==799== still reachable: 10,580,088 bytes in 42,560 blocks
==799== suppressed: 0 bytes in 0 blocks