Thanks for rechecking on that :) I am sorry if my statement conveyed otherwise but let me clarify my query further:
I create two 1D float type arrays on the GPU. Each thread writes to an element in each of the two arrays. Element is indexed by the position ID (= blockDim.x * blockIdx.x + threadIdx.x). I need 906572 threads to run,3553 bytes of global memory per thread (both arrays combined) and I am running them in 2024 blocks comprising of 448 threads which gives me total of ~3GB for all the threads. now a memalloc of each of these 1D arrays comes to ~1.5GB in size on the GPU. What I want to ask is that :
why this memalloc is failing ?? (Err:> Out of Memory ). If GPU uses 32 bit addressable space,then I must be able to access more than 3GB and thus no ‘out of memory’ error.