I am trying to integrate cuDSS in a Fortran codebase, however, running the code to some errors, that are different from run to run:
[2025-07-15 19:51:27][CUDSS][2587975][Api][cudssCreate] start
[2025-07-15 19:51:27][CUSPARSE][2587975][Trace][cusparseCreate] cudaFree(0)
[2025-07-15 19:51:27][CUSPARSE][2587975][Trace][cusparseCreate] cudaGetDevice
[2025-07-15 19:51:27][CUSPARSE][2587975][Trace][cusparseCreate] cudaGetDeviceProperties(0)
[2025-07-15 19:51:27][CUSPARSE][2587975][Trace][cusparseCreate] cudaDriverGetVersion
[2025-07-15 19:51:27][CUSPARSE][2587975][Trace][cusparseCreate] cudaDeviceGetAttribute(115, 0)
[2025-07-15 19:51:27][CUSPARSE][2587975][Trace][cusparseCreate] cudaFuncGetAttributes
[2025-07-15 19:51:27][CUSPARSE][2587975][Api][cusparseCreate] handle[out]=0x563d37f0, version=12.5.9.5
[2025-07-15 19:51:27][CUDSS][2587975][Api][cudssConfigCreate] start
[2025-07-15 19:51:27][CUDSS][2587975][Api][cudssDataCreate] start
[2025-07-15 19:51:27][CUSPARSE][2587975][Api][cusparseSetStream] handle[in]=0x563d37f0, stream[in]=0x3c2d9450
[2025-07-15 19:51:27][CUDSS][2587975][Api][cudssSetStream] start
[2025-07-15 19:51:27][CUDSS][2587975][Api][cudssSetCommLayer] start
Using comm library: /home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/lib/libcudss_commlayer_openmpi.so
[2025-07-15 19:51:27][CUDSS][2587975][Api][cudssDataSet] start
[2025-07-15 19:51:27][CUDSS][2587975][Api][cudssSetThreadingLayer] start
[2025-07-15 19:51:27][CUDSS][2587975][Info][cudssSetThreadingLayer] Default number of threads for the set threading layer = 12
************************************************************
********** solving for frequency 1 / 2
********** complex frequency : 0.4000 Hz + 0.0000
************************************************************
--------------------------------------------------------------------------------
==========> SYMMETRIC Matrix processing
- wavelength min/max (SI) : 2.50000E-01 2.50000E-01
- min dof per wavelength :
- order 5 cells : 3.62317E+01
- matrix global size : 74898
- matrix approx. global nnz : 1402542
- global memory for matrix : 21.400 MiB
- mem/proc for ref matrices : 20.672 KiB
- matrix creation time : 1.359 sec
- matrix exact global nnz : 1402542
[2025-07-15 19:51:28][CUDSS][2587975][Api][cudssMatrixCreateCsr] start
[2025-07-15 19:51:28][CUDSS][2587975][Api][cudssExecute] start
[2025-07-15 19:51:28][CUDSS][2587975][Info][cudssExecute] CUDSS_CONFIG_REORDERING_ALG 0 requires = 80437596 bytes (0.080437596 GB) in host memory
[2025-07-15 19:51:28][CUDSS][2587975][Info][cudssExecute] Using 12 threads on host for the reordering
[eduard-Pro-I5-11F-3060Ti:2587975:0:2587985] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7eac300e937c)
[eduard-Pro-I5-11F-3060Ti:2587975] *** Process received signal ***
[eduard-Pro-I5-11F-3060Ti:2587975] Signal: Segmentation fault (11)
[eduard-Pro-I5-11F-3060Ti:2587975] Signal code: Invalid permissions (2)
[eduard-Pro-I5-11F-3060Ti:2587975] Failing at address: 0x7eac3278a118
[eduard-Pro-I5-11F-3060Ti:2587975] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45810) [0x7eacc4c45810]
[eduard-Pro-I5-11F-3060Ti:2587975] [ 1] ../../build/cudss-dev/code/app/forward_waveform_acoustic_isotropic_hdg.out() [0xdfcc04]
[eduard-Pro-I5-11F-3060Ti:2587975] [ 2] --------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node eduard-Pro-I5-11F-3060Ti exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Sometimes instead it fails instead with something like this
[2025-07-15 19:55:42][CUDSS][2623400][Api][cudssMatrixCreateCsr] start
[2025-07-15 19:55:42][CUDSS][2623400][Api][cudssExecute] start
[2025-07-15 19:55:42][CUDSS][2623400][Info][cudssExecute] CUDSS_CONFIG_REORDERING_ALG 0 requires = 80241076 bytes (0.080241076 GB) in host memory
[2025-07-15 19:55:42][CUDSS][2623400][Info][cudssExecute] Using 12 threads on host for the reordering
[1752602142.440458] [eduard-Pro-I5-11F-3060Ti:2623400:0] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[1752602142.440475] [eduard-Pro-I5-11F-3060Ti:2623400:2] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[eduard-Pro-I5-11F-3060Ti:2623400:2:2623408] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a940)
[eduard-Pro-I5-11F-3060Ti:2623400:4:2623410] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a94c)
[1752602142.440492] [eduard-Pro-I5-11F-3060Ti:2623400:4] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[1752602142.440501] [eduard-Pro-I5-11F-3060Ti:2623400:5] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[1752602142.440519] [eduard-Pro-I5-11F-3060Ti:2623400:5] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440531] [eduard-Pro-I5-11F-3060Ti:2623400:5] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[eduard-Pro-I5-11F-3060Ti:2623400:5:2623407] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a948)
[1752602142.440457] [eduard-Pro-I5-11F-3060Ti:2623400:1] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[eduard-Pro-I5-11F-3060Ti:2623400:1:2623404] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a938)
[1752602142.440508] [eduard-Pro-I5-11F-3060Ti:2623400:2] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440520] [eduard-Pro-I5-11F-3060Ti:2623400:6] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[1752602142.440556] [eduard-Pro-I5-11F-3060Ti:2623400:6] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[eduard-Pro-I5-11F-3060Ti:2623400:6:2623403] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a93c)
[1752602142.440479] [eduard-Pro-I5-11F-3060Ti:2623400:3] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440560] [eduard-Pro-I5-11F-3060Ti:2623400:7] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[1752602142.440508] [eduard-Pro-I5-11F-3060Ti:2623400:4] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[eduard-Pro-I5-11F-3060Ti:2623400:7:2623400] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a938)
[1752602142.440545] [eduard-Pro-I5-11F-3060Ti:2623400:1] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440560] [eduard-Pro-I5-11F-3060Ti:2623400:6] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[eduard-Pro-I5-11F-3060Ti:2623400:3:2623405] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7cdd57dc)
[eduard-Pro-I5-11F-3060Ti:2623400:0:2623409] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a950)
[1752602142.440661] [eduard-Pro-I5-11F-3060Ti:2623400:3] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440573] [eduard-Pro-I5-11F-3060Ti:2623400:7] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440570] [eduard-Pro-I5-11F-3060Ti:2623400:8] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[1752602142.440682] [eduard-Pro-I5-11F-3060Ti:2623400:9] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[eduard-Pro-I5-11F-3060Ti:2623400:8:2623401] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a944)
[1752602142.440692] [eduard-Pro-I5-11F-3060Ti:2623400:9] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440685] [eduard-Pro-I5-11F-3060Ti:2623400:8] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[eduard-Pro-I5-11F-3060Ti:2623400:9:2623406] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a93c)
[1752602142.440665] [eduard-Pro-I5-11F-3060Ti:2623400:0] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440692] [eduard-Pro-I5-11F-3060Ti:2623400:10] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[eduard-Pro-I5-11F-3060Ti:2623400:10:2623411] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a938)
[eduard-Pro-I5-11F-3060Ti:2623400:11:2623402] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x77bd7ca8a93c)
[1752602142.440722] [eduard-Pro-I5-11F-3060Ti:2623400:11] debug.c:1301 UCX WARN ucs_debug_disable_signal: signal 8 was not set in ucs
[1752602142.440723] [eduard-Pro-I5-11F-3060Ti:2623400:10] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[1752602142.440700] [eduard-Pro-I5-11F-3060Ti:2623400:9] spinlock.c:29 UCX WARN ucs_recursive_spinlock_destroy() failed: busy
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: /lib/x86_64-linux-gnu/libc.so.6: unknown type [0x13] section `.relr.dyn'
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x4e8e58 vs 0x44a6a0)
==== backtrace (tid:2623409) ====
0 0x0000000000045810 __sigaction() ???:0
1 0x0000000000dfcc04 cuBucketSortKeysInc() ???:0
2 0x0000000000e04f54 cuMatch_SHEM() ???:0
3 0x0000000000e055dd cuCoarsenGraphNlevels() ???:0
4 0x0000000000e05637 cuMlevelNodeBisectionL2() ???:0
5 0x0000000000e05f66 cuMlevelNestedDissectionP_new() ???:0
6 0x000000000000808b cudssParallelFor._omp_fn.0() tmpxft_0000012b_00000000-6_cudss_mtlayer_omp.cudafe1.cpp:0
7 0x000000000010122b GOMP_ordered_end() ???:0
8 0x0000000000120fc9 __kmp_invoke_microtask() ???:0
9 0x0000000000085315 __kmp_fork_call() ???:0
10 0x00000000000837f6 __kmp_fork_call() ???:0
11 0x00000000000f8eee __kmpc_for_collapsed_init() ???:0
12 0x00000000000a2ef1 pthread_condattr_setpshared() ???:0
13 0x000000000013445c __clone() ???:0
=================================
[eduard-Pro-I5-11F-3060Ti:2623400] *** Process received signal ***
[eduard-Pro-I5-11F-3060Ti:2623400] Signal: Segmentation fault (11)
[eduard-Pro-I5-11F-3060Ti:2623400] Signal code: (-6)
[eduard-Pro-I5-11F-3060Ti:2623400] Failing at address: 0x3e8002807a8
[eduard-Pro-I5-11F-3060Ti:2623400] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45810) [0x77be11645810]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 1] ../../build/cudss-dev/code/app/forward_waveform_acoustic_isotropic_hdg.out() [0xdfcc04]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 2] ../../build/cudss-dev/code/app/forward_waveform_acoustic_isotropic_hdg.out() [0xe04f54]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 3] ../../build/cudss-dev/code/app/forward_waveform_acoustic_isotropic_hdg.out() [0xe055dd]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 4] ../../build/cudss-dev/code/app/forward_waveform_acoustic_isotropic_hdg.out() [0xe05637]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 5] ../../build/cudss-dev/code/app/forward_waveform_acoustic_isotropic_hdg.out() [0xe05f66]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 6] /home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/lib/libcudss_mtlayer_gomp.so(+0x808b) [0x77bda240808b]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 7] /lib/x86_64-linux-gnu/libomp.so.5(+0x10122b) [0x77be777cb22b]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 8] /lib/x86_64-linux-gnu/libomp.so.5(__kmp_invoke_microtask+0x99) [0x77be777eafc9]
[eduard-Pro-I5-11F-3060Ti:2623400] [ 9] /lib/x86_64-linux-gnu/libomp.so.5(+0x85315) [0x77be7774f315]
[eduard-Pro-I5-11F-3060Ti:2623400] [10] /lib/x86_64-linux-gnu/libomp.so.5(+0x837f6) [0x77be7774d7f6]
[eduard-Pro-I5-11F-3060Ti:2623400] [11] /lib/x86_64-linux-gnu/libomp.so.5(+0xf8eee) [0x77be777c2eee]
[eduard-Pro-I5-11F-3060Ti:2623400] [12] /lib/x86_64-linux-gnu/libc.so.6(+0xa2ef1) [0x77be116a2ef1]
[eduard-Pro-I5-11F-3060Ti:2623400] [13] /lib/x86_64-linux-gnu/libc.so.6(+0x13445c) [0x77be1173445c]
[eduard-Pro-I5-11F-3060Ti:2623400] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node eduard-Pro-I5-11F-3060Ti exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Sometimes with this
0: DEALLOCATE: memory at (nil) not allocated
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[28935,1],0]
Exit code: 127
--------------------------------------------------------------------------
The last one in particular happens every time I run the program through GDB, making it difficult to find the problem. I am launching the executable with
mpirun --bind-to none -np 1 ../../build/cudss-dev/code/app/forward_waveform_acoustic_isotropic_hdg.out parameter=par.modeling_acoustic
The .cpp
file that wraps around cuDSS is compiled with the following command
/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/compilers/bin/nvc++ -DCUDSS_STATIC_LIBRARY -DHAWEN_CUDSS_COMM_LIB_PATH=\\\"/home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/lib/libcudss_commlayer_openmpi.so\\\" -DHAWEN_CUDSS_GOMP_LIB_PATH=\\\"/home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/lib/libcudss_mtlayer_gomp.so\\\" -DHAWEN_ENABLE_ASSERTIONS -DHAWEN_FORTRAN_IKIND_MAT=i4 -DHAWEN_FORTRAN_IKIND_MESH=i4 -DHAWEN_FORTRAN_IKIND_METIS=i4 -DHAWEN_FORTRAN_RKIND_MAT=sp -DHAWEN_FORTRAN_RKIND_MESH=dp -DHAWEN_FORTRAN_RKIND_METIS=sp -DHAWEN_FORTRAN_RKIND_POL=dp -DHAWEN_USE_CUDSS -I/home/eduard/Github/hawen_worktree/cudss/code/src/macros -I/home/eduard/Github/hawen_worktree/cudss/.cache/CPM/metis_fc/8987/libmetis -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/lib -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /home/eduard/Github/hawen_worktree/cudss/.cache/CPM/metis_fc/8987/include -isystem /home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/targets/x86_64-linux/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/math_libs/12.9/include -g -O0 -std=gnu++20 -Wall -Wextra -pthread -o code/src/CMakeFiles/hawen_lib.dir/linear-algebra/solvers/cuDSS/solver.cpp.o -c /home/eduard/Github/hawen_worktree/cudss/code/src/linear-algebra/solvers/cuDSS/solver.cpp"
while the other Fortran files, for example the one that wraps the cpp calls, are linked with this flags
/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/compilers/bin/nvfortran -DCUDSS_STATIC_LIBRARY -DHAWEN_CUDSS_COMM_LIB_PATH=\\\"/home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/lib/libcudss_commlayer_openmpi.so\\\" -DHAWEN_CUDSS_GOMP_LIB_PATH=\\\"/home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/lib/libcudss_mtlayer_gomp.so\\\" -DHAWEN_ENABLE_ASSERTIONS -DHAWEN_FORTRAN_IKIND_MAT=i4 -DHAWEN_FORTRAN_IKIND_MESH=i4 -DHAWEN_FORTRAN_IKIND_METIS=i4 -DHAWEN_FORTRAN_RKIND_MAT=sp -DHAWEN_FORTRAN_RKIND_MESH=dp -DHAWEN_FORTRAN_RKIND_METIS=sp -DHAWEN_FORTRAN_RKIND_POL=dp -DHAWEN_USE_CUDSS -I/home/eduard/Github/hawen_worktree/cudss/code/src/macros -I/home/eduard/Github/hawen_worktree/cudss/.cache/CPM/metis_fc/8987/libmetis -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/lib -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/comm_libs/12.9/hpcx/hpcx-2.22.1/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /home/eduard/Github/hawen_worktree/cudss/.cache/CPM/metis_fc/8987/include -isystem /home/eduard/Github/hawen_worktree/cudss/.cache/CPM/cudss/570e/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/targets/x86_64-linux/include -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/25.5/math_libs/12.9/include -g -O0 -Mbounds -module include -Wall -Wextra -mp -pthread -c /home/eduard/Github/hawen_worktree/cudss/code/src/linear-algebra/solvers/cuDSS/m_cudss_solver.f90 -o code/src/CMakeFiles/hawen_lib.dir/linear-algebra/solvers/cuDSS/m_cudss_solver.f90.o
If needed I can provide the sources of the C++ file, it’s not particularly big. Thank you in advantance