If gobal memory is a 32bit addressable space how can one get to use anything more than 2^32 bit addresses ?
When I try to do it my kernel fails and I get error ‘Memory value too large’ ? The 4GB of Tesla is then of no use !!!
I am trying to create two 1 D arrays with size 1.5GB each and trying to address them. I have Tesla c1060 and am
using CUDA2.3.
Thanks for rechecking on that :) I am sorry if my statement conveyed otherwise but let me clarify my query further:
I create two 1D float type arrays on the GPU. Each thread writes to an element in each of the two arrays. Element is indexed by the position ID (= blockDim.x * blockIdx.x + threadIdx.x). I need 906572 threads to run,3553 bytes of global memory per thread (both arrays combined) and I am running them in 2024 blocks comprising of 448 threads which gives me total of ~3GB for all the threads. now a memalloc of each of these 1D arrays comes to ~1.5GB in size on the GPU. What I want to ask is that :
why this memalloc is failing ?? (Err:> Out of Memory ). If GPU uses 32 bit addressable space,then I must be able to access more than 3GB and thus no ‘out of memory’ error.
Host and device pointers are in separate memory spaces. Problem solved.
I.e. both the host and the device have the full 4GB of RAM available. (well, the host maybe just 3.4 or 3.75MB when you use Windows, but that’s a different issue)