Another bump up, with hopefully some better info.
So attached are the latest files (and more) that should allow you to run this on pretty much any system. As before it works on a 9500M and 250 GTS, but not on a C2070.
The general idea is to create a hash table in global memory in which the hash table entries contain aggregates of tuples of input data. The design is based off the hash table design in CUDA by Example.
I’ve compiled in debug mode and run it with “set cuda memcheck on” in the debugger to get the following errors in three different runs:
[Launch of CUDA Kernel 0 (computeAggregation) on Device 0]
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching to CUDA Kernel 0 (<<<(19,0),(64,0,0)>>>)]
0x0000000000eb4d80 in computeAggregation ()
[Launch of CUDA Kernel 0 (computeAggregation) on Device 0]
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching to CUDA Kernel 0 (<<<(0,0),(66,0,0)>>>)]
0x00000000007dbd80 in computeAggregation ()
[Launch of CUDA Kernel 0 (computeAggregation) on Device 0]
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching to CUDA Kernel 0 (<<<(4,0),(162,0,0)>>>)]
0x0000000001cb5d80 in computeAggregation ()
Lane Illegal Address according to the debugging manual is caused by an “illegal(out of bounds) global address”. Stepping through the logic, I can’t seem to figure out where it might step out of bounds. Neither do I understand why it produces correct results on older (less strict) architectures, but errors out on the Fermi. I also removed the use of pointers and instead used a straight index to determine the location within the arrays. I know that on the Fermi there is a more universal address space, could this be the cause?
I’ve tried to further debug with breakpoints but due to it being a random thread which casuses the error in every iteration, do you have any suggestions on a better way to debug? Perhaps some specific commands, I was never too good with gdb.
Thanks in advance for any help, or simply making it this far in reading!
CUDA Toolkit: 3.2
Device: C2070
Compile options: sm_20
src.zip (4.9 KB)