Access to Large Matrix

I have a problem similar to the one posted in

I can manage to mmap a large array (52000x52000).

I then spawn 4 threads, each one related to a different GPU (I’m using a S1070).

When I try to read from it through cudaMemcpy2D I get “bus error”.

If I try to read from the same address I pass to the cuda routine, I get the expected value, so it means that I don’t refer to a wrong area of the buffer.

Moreover if I run the code with smaller datasets it works fine.

Here the (I think) relevant portion of the code.


if((data = (float *)mmap(0, (size_t)m*n*sizeof(float), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, 0)) == MAP_FAILED)


		cerr << "Error: problems while mapping input file." << endl;




Every thread then accesses the first time in this way:

cudaMemcpy2D(dgrid, pitch, p->grid-offset, p->n*sizeof(float), p->n*sizeof(float), height, cudaMemcpyHostToDevice);

p->grid points to the area assigned to the specific thread.

I run the code on Ubuntu 9.04 64-bit, intel i7, 12 GB of RAM. “ulimit -a” gives unlimited file size (I create the datasets myself).