cudaMalloc_ReadOnly

Hi,

I allocate around 107MB on my 9800GT (512MB) using:

[codebox]CUDA_SAFE_CALL(cudaMalloc((void **)&device_mem, (size_env + size_swap + size_ret)));[/codebox].

Now if I try to write to this memory from the host side, Ill get a segmentation fault.

Im not sure if I can write to allocated memory from host side so i accepted this.

My problem is that I also cant write to it from device code. Reading the whole memory

is absolutely no problem but if I try to write into areas that are located in:

[codebox]*(unsigned int *)(device_mem + size_env + size_swap) = 1;[/codebox]

iIll get graphic errors (very wild one…) while:

[codebox]*(unsigned int *)(device_mem) = 1;[/codebox]

works fine!

[b]

size_env = 5 MB

size_swap = 51MB

size_ret = 51MB

[/b]

So I dont understand why I cant write into the allocated memory area…

Any idea which mistake a made?

:rolleyes:

THX!

hi,

for wrtiting from host to device memory you have to use cudaMemcpy() function.
the problem writing to device memory on the device is strange.
can you post some more code. for example how you pass the device pointer and the size_env… to the kernel.

Kernel Head:“global void MY_Kernel(char * device_mem)”
Pass: “MY_Kernel<<<dimGrid, dimBlock>>>(device_mem);”

is there a possibility to check how much memory i allocated and which sections i can write to?

hm… I will test something…
ok I tested the following:
writing random data from host side all over the allocated memory… and it works…

FILE* ranSrc = fopen("/dev/urandom", “r”);
CUDA_SAFE_CALL(cudaMalloc((void **)&device_mem, (size_env + size_swap + size_ret)));
host_mem = (char *) malloc((size_env + size_swap + size_ret));
fread(host_mem, sizeof(char), (size_env + size_swap + size_ret), ranSrc);
cudaMemcpy(device_mem, host_mem, (size_env + size_swap + size_ret), cudaMemcpyHostToDevice);

so there has to be another reason why I cant write to the memory area from device side…
I will try to use the new cuda 2.2 beta driver and compiler with 64bit debugging support :clap:
hope Ill get one…

EDIT: the size_env var is not passed as you can see… the reason is that i use a constant “size_env = DEVICE_MEM_ENV * sizeof(char);”
as far as sizeof(char) = 1 , it should be the same size…
or is sizeof(char) different on host and device side?
thx!

afaik the size of a char is the same.

you have still problems with writing the device memory like this:

*(unsigned int *)(device_mem) = 1;

there are “just” graphics errors?

have you checked all of the return values of the API-calls?

one idea: try using more brackets:

*((unsigned int *)(device_mem)) = 1;

sounds stupid, but I actually can’t figure out what’s wrong.

I had tested mutch the last time…

I figured out that there is no problem if I write the data sequential from device into the allocated device memory… like this:

[codebox]

unsigned int j;

			for(j=0;j < NUM_THREADS;j++){

				

				if(blockDim.x * blockIdx.x + threadIdx.x == j){      //one dimensional grid and one dimensional thread matrix this time...

				

					*(unsigned int *)(device_mem + DEVICE_MEM_ENV + DEVICE_MEM_SWAP + 

						(blockIdx.x * RETURN_MEM_PER_BLOCK) + (threadIdx.x * SHMEM_BANK_SIZE)) = 1;

				}

				

				__syncthreads();

			}

[/codebox]

that works… but if I remove the condition like this:

[codebox]

//unsigned int j;

//

// for(j=0;j < NUM_THREADS;j++){

//

// if(blockDim.x * blockIdx.x + threadIdx.x == j){ //one dimensional grid and one dimensional thread matrix this time…

//

*(unsigned int *)(device_mem + DEVICE_MEM_ENV + DEVICE_MEM_SWAP +

(blockIdx.x * RETURN_MEM_PER_BLOCK) + (threadIdx.x * SHMEM_BANK_SIZE)) = 1;

// }

//

// __syncthreads();

//

// }

[/codebox]

Ill get problems like graphic errors (random colored dots at random area on the screen…) …

are there special restrictions for parallel memory access? (except those for performance reasons…)