So if I have an array on the GPU with 1000x1000 elements in size, how am I to use the threadIdx’s to access each space in the array. Currently, if I call element in the array I use "arrayName[threadIdx.x + blockIdx.x]. however, if it is this large, I can’t have this many threads operating. What is the bypass for this? Do I have to divide it into sub-matrices?