Is it possible to save identificators of threads directly in the array?

I write grid-stride loop to have HPV, where large N, for example long long N 1<<36, or more. From total grid I need only some indexes, which have to satisfy the define condition.

__global__ void Indexes(int *array, int N) {
int  index  = blockIdx.x * blockDim.x + threadIdx.x;
while( index<N)
	   if (condition)
	   {....//do something to save index in array}	
	index += blockDim.x * gridDim.x;			

Of course, it is possible use the Thrust, which allows to have both host and device arrays. But in this case obviously the calculation will be extremely ineffective, becouse need firstly to create a lot of nonneeded elements, then delite these.

What is the best effective way to save the indexes directly in array to pass in CPU?