Filling data to linear memory

Hi, I am new to CUDA.

I am trying to fill triangles to array but one by one (no space between data)

struct Vertex 

{

    float3 pos;

    float3 normal;

}

__device__ int globalIndex = 0;

__global__ void simpleKernel(Vertex * output)

{

	//struct Triangle { Vertex1, Vertex2, Vertex3 }

	//__shared__ <TriangleStruct> tmp[100]; //- using 100 threads per block

	

	//int writeIndex = threadIdx.x + threadIdx.y;		

	//--- Produce Triangle --

        // if triangle not created - exit thread

	//tmp[writeIndex].Vertex1 - save triangle vertices...

	//---------	

	//__syncthreads();

	

	//int globalWriteIndex = atomicAdd(&globalIndex, 3); //atomic returns oldValue

	//

	//write to output:

	//out[globalWriteIndex + 0] = tmp[writeIndex].vertex1

	//out[globalWriteIndex + 1] = tmp[writeIndex].vertex2

	//out[globalWriteIndex + 2] = tmp[writeIndex].vertex3

	//---------------------------------------------------------------

}

Is it possible ? out array should hold triangles in correct form (vertices in CW / CCW). I can not use globalWriteIndex calculated from thread and block IDx, because that gives me empty spaces between triangles if trinangle is not created.

Thanks

What determines that the tringle is not created?

Regards,

MK