Kernels fail to launch after a certain blockDim.x

sepia.latimanus · January 6, 2012, 7:27pm

Hi all,

I ported a 1-dimensional grid-based PDE solver for CPUs to GPUs with CUDA and everything is working just fine except for when I launch more than about 40 blocks. I’m using 128 threads per block, and the 1D grid is chopped into

int nblocks = (int)((N+((THREADSPERBLOCK-1)-1))/(THREADSPERBLOCK-1))

blocks (where N is the total grid size). Kernel launches look like

Kernel<<<nblocks,THREADSPERBLOCK>>>

cudaThreadSynchronize();

if ( cudaSuccess != cudaGetLastError() )

  printf( "Error: Kernel Failure - Loader_Kernel\n" )

For N = 5000 and below, the code runs fine. For 6000 and above, I get error output from the error handling routine shown above for a kernel devoted to reading memory (I use one kernel to load memory arrays, and another to handle the data, due to problems with block overlap). Does anyone know why this might occur? Previous versions of my code which did not perform entirely satisfactorily did not suffer from this problem so I do not believe it is an issue with lack of memory on my GPU (GTX 460). The program segfaults in the memory-reading kernel with the following cuda-memcheck message:

========= Invalid __global__ read of size 8

=========     at 0x00003720 in ../code_folder/code.cu:3087:Memory_Reading_Kernel

=========     by thread (127,0,0) in block (41,0,0)

=========     Address 0xffbfffff is misaligned

=========

========= ERROR SUMMARY: 1 error

The memory-loading kernel does not give an “Error: Kernel Failure” or segfault.

Does anyone know what could cause this? I am pretty well in the dark here.

Thanks!

S

EDIT: clarity

tera · January 6, 2012, 7:49pm

We are pretty in the dark as well - can you post some code (particularly line 3087 of code.cu, where the misaligned access is happening)?

sepia.latimanus · January 6, 2012, 9:02pm

[s]Sorry for the lack of information; I was assuming that this problem was more general. The offending line is

vec[n] = pV[i+1][n] - pV[i-1][n];

specifically, calling the [i+1] element of pV. i is the array element index and n is used to loop over individual elements in a structure, in this case a struct of a few doubles. cuda-memcheck always gives the error for thread (127,0,0) in block (41,0,0). N = 6000 gives 48 blocks, but the same thread in the same block appears to be responsible for larger grids.

Note that pV is an array of pointers defined as

__constant__ double **pW;

and allocated via

double *pV_ptr;

  cudaErrorCheck(cudaMalloc((void**)&pV_ptr, gridsize*sizeof(double*)),

      "cudaMalloc - pV_ptr");

  cudaErrorCheck(cudaMemcpyToSymbol(pV, &pV_ptr,

      sizeof(double*)), "pV - cudaMemcpyToSymbol");

pV is initialized at the start of the memory-accessing kernel like this

if (i >= istart && i <= iend)

    pV[i] = (double*)&(V[i]);

where each thread has a unique i. For N = 7000 or something, the i corresponding to the thread which segfaults is comfortably between istart and iend.

Hope this reveals the problem, although it has not for me…

S[/s]

Oh dear how could I have been so myopic… Of course this leads to a race condition. Sorry to bother everyone.

Topic		Replies	Views
Wrong output when adding blocks what am I doing wrong? CUDA Programming and Performance	13	12088	December 4, 2007
Issue with a much larger grid than data CUDA Programming and Performance cuda , kernel	9	188	September 25, 2024
Kernel failure Unspecified launch failure CUDA Programming and Performance	1	1509	February 25, 2008
CUDA Kernel does not do anything CUDA Programming and Performance	7	1975	October 12, 2021
Kernel Launch Failure Simple Kernel with 2D array fails init CUDA Programming and Performance	7	9411	November 19, 2007
Grid fails with error 700 (Illegal Memory Access), when reaching specific size, without obvious reason CUDA Programming and Performance	5	501	September 13, 2022
2-D Memory Allocation Issues CUDA Programming and Performance	4	4873	July 15, 2009
Kernel launching incorrectly CUDA Programming and Performance	5	772	June 16, 2011
Odd block dimension behaviour CUDA Programming and Performance	3	2734	June 22, 2011
Kernel crashes with smaller grid size CUDA Programming and Performance	2	2824	April 10, 2008

Kernels fail to launch after a certain blockDim.x

Related topics