It seems that i encounter a problem of mem initialization(i guess), things happened to my program is as following:
The first run of my program totally get a wrong output, then everything is all right when i rerun it repeatedly. according to a previous post–“first run” of the cuda program isn’t correct, the output of my program is also right when i repeatedly rerun it after running sevral other correct CUDA programs.
I put this aside and continued extending my program, then another problem came out when i allocated a dynamic array and wrote to every element of the array in a kernel function. the output is totally wrong If i initialize the array with 0 using cudaMemset() but write nothing to the array in the kernel, the output is correct for preceding kernels before this kernel if i don’t initialize the array using cudaMemset() and write nothing to the array in the kernel, the output of the first run for all kernels is correct if i don’t initialize the array using cudaMemset() and write to the array in the kernel.
i looked up some old posts in the forum and tried to figure it out as possibly as i can, but the problem is still there, so hope someone who know why things happen like this or has encountered and solved such a problem can give me some tips
there was an order depedency between blocks of the first kernel function, it was lucky to get right output after first run. i thought this is the problem, so i modified the kernel, then in release mode, the first kernel always gets a right output now, but the putput of the second kernel is partly correct, and the output of the first kernel is always wrong, not correct for one time in emurelease mode. it is abnormal, which seems that there is still a problem with the first kernel function…
i didn’t do some checks since this error message just came out once at the very first run, this should result from the order dependency between blocks of the first kernel function in my program. i modified my this kernel function to remove the dependency then there is no error emerged with the first kernel function for now, but the output of the second kernel function is partly correct.
according to my programming experience on cpu, a lot of mem errors will cause an odd behavior, which is hard to explain, far from where the error lies
but, the kernel with blocks execution depended with each other should result in a wrong output for the most times, why everything is all right only after the wrong execution once. it is unreasonable…
aha, it seems that i figured out what’s wrong with my program, both the version in release mode and one in emurelease mode run without any abnormal info or error and get a consistent output. in addition to the order dependency of block executions, the other problem with my problem i found is that the first kernel lacked a “_syncthread()” when threads was going to read data from shared mem, which was set by threads before, which resulted in a potential read-after-write,write-after-read or write-after-write hazard. it is interesting that release version worked well and got a correct output, while emu version discarded half of the output.
anyway, i got what i wanted and hope this is helpful for others