The most basic problem,ask for help

I only want to test the device memory visit mode , but failed.

It seems that I have not understood the basic memory access rule.

Anyone can explain it to me. Thank you in advance.

[codebox]#include <stdio.h>

int Ah;

int Bh;

device int Ad;

device int Bd;

global void test_kernel(void);

int main(void)

{

Ah=100;

cudaMemcpy(&Ad,&Ah,sizeof(int),cudaMemcpyHostToDevice);

printf(“HOST A=%d\n”,Ah);

test_kernel<<<1,1>>>();

cudaMemcpy(&Bh,&Bd,sizeof(int),cudaMemcpyDeviceToHost);

printf(“HOST B=%d\n”,Bh);

return 0;

}

global void test_kernel(void)

{

Bd=Ad;

return;

}[/codebox]

the values Ad and Bd are declared on the device. They do not exist on the host, so you cannot even take the address of them on the host. You will need to use cudaMemcpyToSymbol to initialize those variables.

I don’t think thats the problem, he is already copieing the data from the host to the gpu and back, he doesn’t try and access the device memory from the host directly. But as far as i know you cannot declare global device memory like you did. you need to use cudamemalloc, at lease i always do and it works.

No, I’ve seen this before a half-dozen times on the forums: it most certainly is the problem. Taking the address of a device variable cannot be done on the host (why the compiler doesn’t produce an error: I don’t know). So those cudaMemcpys are either failing with “invalid device pointer” (the OP doesn’t check for errors) or they are writing/reading to/from garbage. cudaMemcpyToSymbol is the only way to copy to such a variable (well… I guess cudaGetSymbolAddress + cudaMemcpy would work, too).

But you don’t have to believe my word, I can prove it:

#include <stdio.h>

__device__ int A;

int main()

	{

	printf("&A  = %x\n", &A);

	int * d_A;

	cudaGetSymbolAddress((void**)&d_A, A);

	printf("cudaGetSymbolAddress(A) = %x", d_A);

	return 0;

	}

Output:

&A  = 2048

cudaGetSymbolAddress(A) = 1700700

You can declare a variable that way. The programming guide outlines it very clearly which really throws new users off, as it is one of the most cumbersome ways to use device memory. Using cudaMalloc to allocate pointers is much easier, as you say, and also much more flexible.

4 MisterAnderson42: Ok :) i see were i was wrong, thanks for clearing it up for me.

4 mybiandou: use cudaMalloc, much simpler …

Thank both of you very much.

As MisterAnderson42 said, using cudaMemcpyToSymbol and cudaMemcpyFromSymbol can solve this problem well.

The address of device memory can not be achieved or used by & operator. (I think probably the result of & operator is another far pointer to real device memory)

Also as erdoom said, using cudaMalloc is another way to solve this problem though I prefer to declare all of the variables first and then allocate memory for them.

Thank you all.

The following is my revised program

[codebox]#include <stdio.h>

int Ah;

int Bh;

device int Ad;

device int Bd;

global void test_kernel(void);

int main(void)

{

Ah=100;

// cudaMemcpy(&Ad,&Ah,sizeof(int),cudaMemcpyHostToDevice);

cudaMemcpyToSymbol(“Ad”,&Ah,sizeof(int));

printf("HOST A=%d\n",Ah);

test_kernel<<<1,1>>>();

// cudaMemcpy(&Bh,&Bd,sizeof(int),cudaMemcpyDeviceToHost);

cudaMemcpyFromSymbol(&Bh,“Bd”,sizeof(int));

printf("HOST B=%d\n",Bh);

return 0;

}

global void test_kernel(void)

{

Bd=Ad;

return;

}

[/codebox]