memcheck errors on array initialization

tomu · June 13, 2012, 10:49pm

I have a kernel in which each thread does some processing and sets values in an array. This kernel occasionally dies with an “unspecified launch failure”. Running under cuda-memcheck, a “Lane Illegal Address” is reported (and sometimes other errors, such as “Device Illegal Address” or “Warp Out-of-range Address”). I’ve tried running in cuda-gdb with memcheck on, and autostepping the kernel code to get at the context of the failure, but so far this stops on a line of code that only involves thread-local variables.

I have reduced the code to a very simple example that exhibits the problem; it is attached. This code does not produce consistent results; on each run, it does one of the following:

runs to completion with all array values correctly set
runs to completion, reporting that some array values have not been set (a different number of unset values each run)
dies with an unspecified launch failure

I believe that the array should be properly initialized without memory errors and crashing. Even in the cases in which it runs to completion, memory errors are reported by cuda-memcheck.

Can anyone see why this code might not consistently initialize the array elements? Or why it would encounter a “Lane Illegal Address”?

Running with cuda-memcheck produces some variant of the following. The addresses it reports as being out of range are, in fact, within the range of the allocated memory.

cuda-memcheck test
========= CUDA-MEMCHECK
num threads : 10240 (512 per block, 20 blocks per grid)
error in cudaMemcpy: unspecified launch failure
========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (20,0,0) in block (5,0,0)
========= Address 0x2000fbf4c is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (65,0,0) in block (18,0,0)
========= Address 0x0058a594 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (78,0,0) in block (18,0,0)
========= Address 0x0058b9e4 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (83,0,0) in block (18,0,0)
========= Address 0x0058c1b4 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (87,0,0) in block (18,0,0)
========= Address 0x0058c7f4 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (90,0,0) in block (18,0,0)
========= Address 0x0058cca4 is out of bounds

========= ERROR SUMMARY: 6 errors

This is on a GTX580 with CUDA 4.2, compiled to compute capability 2.0:

nvcc -o test test.cu -arch=compute_20 -code=sm_20
test.cu (1.33 KB)

tera · June 13, 2012, 11:45pm

[font=“Courier New”]numThreads[/font] is not a multiple of the blocksize, so you need to explicitly disable those threads inside the last block with [font=“Courier New”]i >= numThreads[/font]:

__global__ void doit(int *a, int num, int numPer, int numThreads)

{

    int i = blockDim.x * blockIdx.x + threadIdx.x;

if (i < numThreads) {

        int start = i * numPer;

        int finish = start + numPer;

for(int j=start; j<finish; j++ )

            if(j<num)

                a[j] = FLAG;

    }

}

Topic		Replies	Views
seem a problem of mem initialization output isn't always corect CUDA Programming and Performance	6	2923	January 8, 2009
CUDA_EXCEPTION_1, Lane Illegal Address but size was allocated with malloc CUDA Programming and Performance	6	3157	November 7, 2011
Bug in cuda-memcheck? CUDA Programming and Performance	2	1427	March 14, 2013
Losing CUDA calculatons CUDA Programming and Performance	5	2326	March 21, 2011
ULF on a simple example CUDA program CUDA Programming and Performance	2	5814	February 17, 2009
another unspecified launcher failure CUDA Programming and Performance	1	2906	March 15, 2009
Unspecified launch failure CUDA Programming and Performance	2	5711	May 24, 2009
unspecified launch failure simple volume initialization fails CUDA Programming and Performance	9	6247	August 26, 2007
unspecified launch failure on memory write CUDA Programming and Performance	1	953	April 12, 2010
Problems on memory allocation CUDA Programming and Performance	5	1020	April 24, 2012

memcheck errors on array initialization

========= Invalid global write of size 4 ========= at 0x00000170 in test.cu:15:doit ========= by thread (65,0,0) in block (18,0,0) ========= Address 0x0058a594 is out of bounds

========= Invalid global write of size 4 ========= at 0x00000170 in test.cu:15:doit ========= by thread (78,0,0) in block (18,0,0) ========= Address 0x0058b9e4 is out of bounds

========= Invalid global write of size 4 ========= at 0x00000170 in test.cu:15:doit ========= by thread (83,0,0) in block (18,0,0) ========= Address 0x0058c1b4 is out of bounds

========= Invalid global write of size 4 ========= at 0x00000170 in test.cu:15:doit ========= by thread (87,0,0) in block (18,0,0) ========= Address 0x0058c7f4 is out of bounds

========= Invalid global write of size 4 ========= at 0x00000170 in test.cu:15:doit ========= by thread (90,0,0) in block (18,0,0) ========= Address 0x0058cca4 is out of bounds

Related topics

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (65,0,0) in block (18,0,0)
========= Address 0x0058a594 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (78,0,0) in block (18,0,0)
========= Address 0x0058b9e4 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (83,0,0) in block (18,0,0)
========= Address 0x0058c1b4 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (87,0,0) in block (18,0,0)
========= Address 0x0058c7f4 is out of bounds

========= Invalid global write of size 4
========= at 0x00000170 in test.cu:15:doit
========= by thread (90,0,0) in block (18,0,0)
========= Address 0x0058cca4 is out of bounds