Buffer Access Failed with NVidia Device

Hi guys…
I pretty new to opencl but to normal programming. Unfortunately it´s very hard to detect errors in opencl kernels - therefore I would like for some advices. Theoretically it should be pretty easy but it doesnt seems so. The problem only appears when using a NVidia device but not when I use a CPU device: Here my problem:

Kernel:
typedef struct s_Point
{
float m_X;
float m_Y;
}
Point;

typedef struct s_PointTriple
{
Point m_PointOne;
Point m_PointTwo;
Point m_PointThree;
}
PointTriple;

///////////////////////////////////////////////////////////////////////////////
__kernel void test(__global GridPoint* in_BufferOne, __global PointTriple* in_BufferTwo)
{
__private Point l_Point;
unsigned int l_X, l_Y, l_Index;

l_X = get_global_id(0);
l_Y = get_global_id(1);
l_Index = get_global_size(0)*l_Y + l_X;

// calc something for l_Point use buffer one
l_Point.m_X = in_BufferOne[l_Index].m_X * 0.1;

// save result
in_BufferTwo[l_Index].PointOne.m_X = l_Point.m_X;
in_BufferTwo[l_Index].PointOne.m_Y = l_Point.m_Y;
in_BufferTwo[l_Index].PointTwo.m_X = l_Point.m_X;
in_BufferTwo[l_Index].PointTwo.m_Y = l_Point.m_Y;
in_BufferTwo[l_Index].PointThree.m_X = l_Point.m_X;
in_BufferTwo[l_Index].PointThree.m_Y = l_Point.m_Y;
}

The localsize is settet automatically and my global dimensions are 512x512. The bufferSizes fits my calculation (6291456 bytes) and the buffers are valid. Any suggestions what could be wrong?!?!

I made some experiments - writing to one buffer pos always:
in_BufferTwo[0] = 1.0; → works (both kind of devices)
in_BufferTwo[162144] = 1.0; → works (both kind of devices)
in_BufferTwo[262144] = 1.0; → fails when using the NVidia device | works when using Intel CPU device

Ok… solved. Calculated the index to access wrong.