Weird CUDA behavior

am doing a gaussian smoothening in the y direction (along the column) for a gaussian kernel of 7 for an image which is stored as a 1d array in cuda. if the 7 elements in a column, going from top to bottom, are c1,c2…c7 and gaussian elements are x1,x2…x7, then value of central pixel of that column c4 is :

c4(centre of the column) = c1x1+c2x2+…c7*x7.

however this assignement is giving a cuda memory exception. the weird thing is that this very same smoothening operation works very well for a row and does not give an exception. any ideas? a thread generated in my case accesses one column at a time. also the error message i get is :
First-chance exception at 0x76b9fbae in example1.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fd20…

Sounds like a bug in your code. We have no hope unless you post it, or at least a simplified version that reproduces the error.