I put the data files in a directory ./s_dump/temp_bin/data which is referenced from the directory where I have the source code. I modified the tsd_const_v4.cuh file so that each of the (6) filenames had an additional “.”, like so:
#define in_arr_dump_filenameissAn "./s_dump/temp_bin/data/issn.txt"
I put all source and header files in the same directory. I built the code like so:
nvcc -o test -lineinfo -g -G algo_main_aux.cpp algo_main_funct.cpp krnlA.cu krnlB.cu krnlC.cu mainA.cu mainB.cu tsd_main_com.cu
When I run the code, it seg faults.
If I set that breakpoint in mainA.cu, it is not hit. Line 102 of mainA.cu appears to be this:
if (lint == 1000) // this is line 102
lint = 0;
Here is my cuda-gdb session:
$ cuda-gdb ./test
NVIDIA (R) CUDA Debugger
Portions Copyright (C) 2007-2015 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /home/bob/misc/junk1/V1/test...done.
(cuda-gdb) break mainA.cu:102
Breakpoint 1 at 0x406315: file mainA.cu, line 102.
Starting program: /home/bob/misc/junk1/V1/./test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff701c700 (LWP 4402)]
[New Thread 0x7ffff5beb700 (LWP 4403)]
Program received signal SIGSEGV, Segmentation fault.
0x000000319b28040c in free () from /lib64/libc.so.6
I tried an absolute path instead of a relative path for the filenames, and it still seg faults. It seems to seg fault in the data loading area. It seems to be segfaulting on the read of sol.txt, which is the last of the files to be read. It reads the file for some time but eventually seg faults within tsd_dbl_arr_read_from_file.
The seg fault occurs upon executing the ifile.close() statement at the end of tsd_dbl_arr_read_from_file when reading the last file (sol.txt).
Running your code with valgrind provides, in part, this useful output:
==4825== Invalid write of size 8
==4825== at 0x403804: tsd_dbl_arr_read_from_file(int&, double*, char const*) (algo_main_aux.cpp:207)
==4825== by 0x4083D4: tsd_read_input_arr_from_file(TSD_data*, CUstream_st*) (mainA.cu:828)
==4825== by 0x4060EA: main (mainA.cu:35)
After reviewing your tsd_dbl_arr_read_from_file function, it seems that it has no range checking. It will continue to read input elements, possibly writing beyond the end of the input buffer. This seems to be what is happening. Instrumenting the code to print out the allocated size for the buffer:
tsd_data->h_base_comb = new double[tsd_data->coeff_cnt];
printf("coeff_cnt: %d\n", tsd_data->coeff_cnt);
yields an output of 49 (according to my observation). Whereas the actual cnt value after the function (tsd_dbl_arr_read_from_file) is complete yields a value of 210 for the given file (sol.txt). The 210 number appears to be in agreement with the number of doubles in sol.txt
Due to this overrun of the buffer, I believe some form of data corruption is occurring, and I believe this data corruption is the proximal reason for the seg fault on ifile.close().