I need to initialize an array with the max integer value. I tried doing this with cudaMemset, but it was too slow. I read somewhere that for large arrays a cuda kernel initializing each element was faster. However it is still too slow.

I’m trying to initialize an int array with around 400 000 elements the kernel is taking about 2 ms (in a notebook, with geforce 330M)

```
__global__
void kernel(bounding_box* b_box, int* depthArray, int value){
unsigned int width = b_box->getWidth();
unsigned int height = b_box->getHeight();
unsigned int bx = blockIdx.x;
unsigned int by = blockIdx.y;
int v = value;
// matrix indexes
unsigned int line = by * blockDim.y + threadIdx.y;
unsigned int column = bx * blockDim.x + threadIdx.x;
if(column >= width || line >= height){
return;
}
int index = line * b_box->getWidth() + column;
int* ptr = (int*)(depthArray + (index * MAX_DEPTH));
for(int i = 0; i < MAX_DEPTH; ++i){
ptr[i] = v;
}
```

Each thread sets about 10 positions (MAX_DEPTH = 10)

Am I missing something, is there some approach that I’m not considering?

Thanks,

AndrÃ©