Hello,
I’ve been banging my head on a bug I have with a CUDA kernel for a couple of days now. It only occurs under very specific compile-time circumstances. I have a kernel that attempts convolution. It works under most cases, but there are a few that cause errors. For my test case, I set both arrays to 1’s with leading and/or trailing 0’s so if statements don’t need to occur. This means that in the result array, index 0 should contain 1, index 1 should contain 2, …, and then halfway it reverses and index n-2 should contain 2, and index n-1 should contain 1. This works for the majority of test cases, but with the two initial arrays set to sizes of 44100 with 512 threads per block and 256 blocks per grid, launched as 1 grid, some data in the middle is scrambled. I can’t tell exactly what it’s doing, but blocks 54-56 contain smaller numbers than they should, but still increasing. It gets to the halfway index (though it is NOT the correct value), and reverses, the numbers decreasing. But then at some point (always the same point), and not aligned on a block boundary, 3 blocks (512x3 = 1536 indexes) repeat the same number (41284), then continues decreasing and ending at 1. I have tested this many ways. First, I have added a statement in the kernel as follows:
if(index == 27649) //where index 27649 is one of the affected indexes
{
result[0] = sum;
}
And sure enough, the result array contains the correct result at index 0, with 0’s as all the other values. I have also tested such that as the sum is being totaled, it is written to consecutive indexes in the result array, such as:
for(j = 0; j < size; j++)
{
sum += arr1[j] * arr2[index - j];
if(index == 27649) //where index 27649 is one of the affected indexes
{
result[tempIndex++] = sum;
}
}
After this runs, the result array contains the values: 1,2,3,4,5,6,…,27650 like it should. This means that the correct value is being computed. But the last thing I tested is where I really can’t figure out what’s going on. In the kernel, after the sum has been computed, I have an if statement:
if(sum < 44101) //this includes all the numbers that should be calculated (since there are 44100 numbers, 44100 should be the largest number for an input of all 1's)
{
result[index] = sum;
}
When this runs, the result array contains the pattern I first described above with the repeating numbers and duplicates. Now, if I comment that out and in it’s place instead put:
if(sum == 44000) //this number should be in the output (it is less than 44100) and greater than 0
{
result[index] = sum;
}
The result contains all 0’s except for the two indexes that should contain 44000 (as it should). However, using the first if statement, the number 44000 is nowhere to be found (since the index that should produce it is one of the affected indexes and contains a different number). I have verified all these findings by writing the result array to a file and scanning it by hand and by using the find functionality of Visual Studio 2010 (since it contains 88199 float values, each on a separate line). Keep in mind that if I change the block size to 512, it works properly. Or 128. Or the number of threads to 256. Or change both. Or change the size of the arrays so that the first one is 44100*3 values, and the second is 16382 values.
I don’t understand how this could be where one run of the program shows that there is a result of 44000, and the other doesn’t. Everything is determined at compile time (even the threads per block and number of blocks), so there is no chance of run-time “contamination” to change the results. Does this sound like it could be an nvcc/cuda bug? If so, I can try to provide more of the kernel if it is needed.
My system information:
CUDA and the SDK are the newest (just downloaded today after this happened on 3.2 RC). I am using Visual Studio 2010 with Visual Studio 2008’s compiler. Intel core i7 920 at 2.66Ghz, 6GB DDR3 RAM, PC built by myself (not Dell, HP, etc…), running Windows 7 64-bit, with a single GTX260 (factory (EVGA) overclocked to 626MHz). If this does sound like a bug with nvcc/cuda, and you need more information, please let me know.
Thank you for your time,
Andrew