Newbie: How can I display a device variable's value? climbing the CUDA learning curve

Hello,

I am new to both CUDA and parallel programming, and I am learning (well trying to teach myself) CUDA by applying it to plasma simulation. My question is, can the value in a device variable be copied to a host variable and displayed? I have written

#include <stdio.h>
#include <cuda.h>

device int x1; //declare a variable on the device

//initialize device variable
global void Initialze( int *x2 ){ x1 = 99; *x2 = x1; }

//increment device variable
global void Increment( int *x2 ){ x1 += x1; *x2 = x1; }

int main(int argc, char **argv){

int 	blksz,	//block size, threads per block
	nblk,	//number of blocks per grid
	*val,	//host var to display value of a device var
	*x2;	//a device var

//execution configuration
blksz = 1;
nblk = 1;

//allocate for device variable
cudaMalloc((void **) &x2, sizeof(int));

//allocate for host var
val = (int *)malloc( sizeof(int));

//initialize the device variable and display it
Initialze <<< nblk, blksz >>> ( x2 );
cudaMemcpy(&val, &x2, sizeof(int), cudaMemcpyDeviceToHost);
fprintf(stdout, "%d \n", *val);

//increment the device variable and display it
Increment <<< nblk, blksz >>> ( x2 );
cudaMemcpy(&val, &x2, sizeof(int), cudaMemcpyDeviceToHost);
fprintf(stdout, "%d \n", *val );

return 0;

}//end main

The output I expected was

C:\>mylesson1.exe
99
100

But instead I got

C:\>mylesson1.exe
3555488
3555488

Or some other such random value. Can anyone tell me what I am doing wrong or what concept I am missing (or misunderstanding)? Thnx.

Hello,

I am new to both CUDA and parallel programming, and I am learning (well trying to teach myself) CUDA by applying it to plasma simulation. My question is, can the value in a device variable be copied to a host variable and displayed? I have written

#include <stdio.h>
#include <cuda.h>

device int x1; //declare a variable on the device

//initialize device variable
global void Initialze( int *x2 ){ x1 = 99; *x2 = x1; }

//increment device variable
global void Increment( int *x2 ){ x1 += x1; *x2 = x1; }

int main(int argc, char **argv){

int 	blksz,	//block size, threads per block
	nblk,	//number of blocks per grid
	*val,	//host var to display value of a device var
	*x2;	//a device var

//execution configuration
blksz = 1;
nblk = 1;

//allocate for device variable
cudaMalloc((void **) &x2, sizeof(int));

//allocate for host var
val = (int *)malloc( sizeof(int));

//initialize the device variable and display it
Initialze <<< nblk, blksz >>> ( x2 );
cudaMemcpy(&val, &x2, sizeof(int), cudaMemcpyDeviceToHost);
fprintf(stdout, "%d \n", *val);

//increment the device variable and display it
Increment <<< nblk, blksz >>> ( x2 );
cudaMemcpy(&val, &x2, sizeof(int), cudaMemcpyDeviceToHost);
fprintf(stdout, "%d \n", *val );

return 0;

}//end main

The output I expected was

C:\>mylesson1.exe
99
100

But instead I got

C:\>mylesson1.exe
3555488
3555488

Or some other such random value. Can anyone tell me what I am doing wrong or what concept I am missing (or misunderstanding)? Thnx.

There’s a bug in your code. cudaMemcpy() takes in void *, you’re passing in int ** (extra &'s here).

There’s a bug in your code. cudaMemcpy() takes in void *, you’re passing in int ** (extra &'s here).

Thnx for the input mkaushik I appreciate it. Anyway, I fixed the bug you called out, but I am still not able to read the values I assigned to the device variable ‘x1’. I’ve attached a copy of the corrected source, and it is the same except for the fix (replaced declaration int* with int). As before, I expected

99

100

for the output, but I got

1245056

1245056

Any other thoughts, corrections etc. I am stumped here.
InitialAndIncrementVar.cu (1.39 KB)

Thnx for the input mkaushik I appreciate it. Anyway, I fixed the bug you called out, but I am still not able to read the values I assigned to the device variable ‘x1’. I’ve attached a copy of the corrected source, and it is the same except for the fix (replaced declaration int* with int). As before, I expected

99

100

for the output, but I got

1245056

1245056

Any other thoughts, corrections etc. I am stumped here.

I think your first version was correct, except you don’t want the & before your variable names in the cudaMemcpy calls:

int blksz,	//block size, threads per block

		nblk;	//number of blocks per grid

int*	val,	//host var to display value of a device var

		x2;		//a device var

	//execution configuration

	blksz = 1;

	nblk = 1;

	//allocate for device variable

	cudaMalloc((void **) &x2, sizeof(int));

	//initialize the device variable and display it

	Initialize <<< nblk, blksz >>> ( x2 );

	cudaMemcpy(val, x2, sizeof(int), cudaMemcpyDeviceToHost);

	fprintf(stdout, "%d \n", *val);

	//increment the device variable and display it

	Increment <<< nblk, blksz >>> ( x2 );

	cudaMemcpy(val, x2, sizeof(int), cudaMemcpyDeviceToHost);

	fprintf(stdout, "%d \n", *val );

I haven’t tried this though, so I can’t tell you if it works. Also, I changed the argument to fprintf calls to *val, so it prints the returned value, not the pointer value.

I think your first version was correct, except you don’t want the & before your variable names in the cudaMemcpy calls:

int blksz,	//block size, threads per block

		nblk;	//number of blocks per grid

int*	val,	//host var to display value of a device var

		x2;		//a device var

	//execution configuration

	blksz = 1;

	nblk = 1;

	//allocate for device variable

	cudaMalloc((void **) &x2, sizeof(int));

	//initialize the device variable and display it

	Initialize <<< nblk, blksz >>> ( x2 );

	cudaMemcpy(val, x2, sizeof(int), cudaMemcpyDeviceToHost);

	fprintf(stdout, "%d \n", *val);

	//increment the device variable and display it

	Increment <<< nblk, blksz >>> ( x2 );

	cudaMemcpy(val, x2, sizeof(int), cudaMemcpyDeviceToHost);

	fprintf(stdout, "%d \n", *val );

I haven’t tried this though, so I can’t tell you if it works. Also, I changed the argument to fprintf calls to *val, so it prints the returned value, not the pointer value.