Using arrays-what's the correct way ?

vivekv80 · February 28, 2009, 3:07pm

After going through the forums, I have realized that it’s much better to use 1D arrays. I have allocated a 1D array as follows:

[codebox]

/* fill zeros for U,V,Uin,Vin*/

   //Host arrays

double *U, *V, *Uin, *Vin;

U = (double*) malloc (sizez * sizeof(double));

V = (double*) malloc (sizez * sizeof(double));

Uin = (double*) malloc (sizez * sizeof(double));

Vin = (double*) malloc (sizez * sizeof(double));

	

for(v = 0; v < sizez; ++v)

{

	U[v] = 0;

	V[v] = 0;

	Uin[v] = 0;

	Vin[v] = 0;

	// printf("U[%d] = %f, V[%d] = %f, Uin[%d] = %f, Vin[%d] = %f \n", v, U[v], v, V[v], v, Uin[v], v, Vin[v]);

}



    //device arrays

double *U_d, *V_d, *Uin_d, *Vin_d;

cudaMalloc((void**) &U_d, sizeof(double) * sizez);

cudaMalloc((void**) &V_d, sizeof(double) * sizez);

cudaMalloc((void**) &Uin_d, sizeof(double) * sizez);

cudaMalloc((void**) &Vin_d, sizeof(double) * sizez);



cudaMemcpy(U_d, U, sizeof(double) * sizez, cudaMemcpyHostToDevice);

cudaMemcpy(V_d, V, sizeof(double) * sizez, cudaMemcpyHostToDevice);

cudaMemcpy(Uin_d, Uin, sizeof(double) * sizez, cudaMemcpyHostToDevice);

cudaMemcpy(Vin_d, Vin, sizeof(double) * sizez, cudaMemcpyHostToDevice);

// Check values on device

   for(v = 0; v < sizez; ++v)

{

	printf("U_d[%d] = %f, V_d[%d] = %f, Uin_d[%d] = %f, Vin_d[%d] = %f \n", v, U_d[v], v, V_d[v], v, Uin_d[v], v, Vin_d[v]);

}

[/codebox]

Questions:

Is this the right way? Is there a better way to allocate arrays?
The printf for the loop in the device arrays gives me a segmentation fault, when I run the program on the device. No errors when I run it in emulation mode. My device is a Tesla C1060. Can someone explain why ??

Thanks

_Big_Mac · February 28, 2009, 5:46pm

It is the right way.

If an array is copied from host to device or back often (when those transfers create a bottleneck) you might consider using cudaMallocHost. This will allocate your host arrays in a special way that will allow faster HtD/DtH transfers with the cost of potentially less stable system (if you do it to much). This is called pinned memory or page-locked memory, look it up. But your method is the default one and is correct.

You can’t call host functions from device or vice versa, with the sole exceptions of calling kernels (global functions) from the host. There are more reasons why printf cannot work on device but this one is sufficent - device and host functions are separate domains, as is device and host memory.

It works in emulation mode because emulation makes device code execute on the host and this domain division isn’t rigorously checked.

vivekv80 · February 28, 2009, 6:34pm

Thanks for the reply. AT the same time, if I don’t do a printf and just assign another value to the variable U_d, I get a segmentation fault. My debugger shows that a segmentation fault is generated when I assign a value, for eg: Uin_d[v] = 1;

Is there a different way to change the values inside an array?

MisterAnderson42 · February 28, 2009, 9:19pm

The host and device have entirely separate address spaces. You cannot dereference a device pointer on the host. And thus any read or write of Uin_d[v] should be segfaulting.

The only way to et values to/from the device from/to the host is to use cudaMemcpy.

vivekv80 · February 28, 2009, 11:32pm

Hmm So I changed my code and did it this way. Now the array does not get updated with new values?? Any suggestions??

[codebox]// includes, system

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <math.h>

#include <time.h>

#include <assert.h>

// includes, project

#include <cufft.h>

#include <cutil.h>

void checkCUDAError(const char *msg)

{

cudaError_t err = cudaGetLastError();

if( cudaSuccess != err) 

{

    fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) );

    exit(-1);

}

}

// Kernel that executes on the CUDA device

global void alloc_array(double *a, int N)

{

int idx = blockIdx.x * blockDim.x + threadIdx.x;

if (idx<N)

{

a[idx] = a[idx] + 10;

// printf("blockIdx.x =%d, blockDim.x = %d, threadIdx.x  = %d\n",blockIdx.x, blockDim.x, threadIdx.x );

}

int main()

{

/******************** Get Device info *******************/

// check the compute capability of the device

int cuda_device = 0;

int num_devices=0;

// float elapsed_time;



CUDA_SAFE_CALL( cudaGetDeviceCount(&num_devices) );

if(0==num_devices)

{

    printf("your system does not have a CUDA capable device\n");

    return 1;

}



// check if the command-line chosen device ID is within range, exit if not

if( cuda_device >= num_devices )

{

    printf("choose device ID between 0 and %d\n", num_devices-1);

    return 1;

}

cudaSetDevice( cuda_device );

cudaDeviceProp device_properties;

CUDA_SAFE_CALL( cudaGetDeviceProperties(&device_properties, cuda_device) );

if( (1 == device_properties.major) && (device_properties.minor < 1))

{

	printf("%s does not have compute capability 1.1 or later\n\n", device_properties.name);

}



printf("running on: %s\n\n", device_properties.name );



/******************* acquired device info *************/



int sizex, sizey, sizez;

sizex = 43;

sizey = 72;

sizez = sizex * sizey;

int v;

/* fill zeros for U,V,Uin,Vin*/

double *U, *V, *Uin, *Vin;

CUDA_SAFE_CALL(cudaMallocHost((void**)&U, sizeof(double) * sizez));

CUDA_SAFE_CALL(cudaMallocHost((void**)&V, sizeof(double) * sizez));

CUDA_SAFE_CALL(cudaMallocHost((void**)&Uin, sizeof(double) * sizez));

CUDA_SAFE_CALL(cudaMallocHost((void**)&Vin, sizeof(double) * sizez));



// Check for any CUDA errors

checkCUDAError("cudaMallocHost calls");



for(v = 0; v < sizez; ++v)

{

	U[v] = 0;

	V[v] = 0;

	Uin[v] = 0;

	Vin[v] = 0;

	// printf("U[%d] = %f, V[%d] = %f, Uin[%d] = %f, Vin[%d] = %f \n", v, U[v], v, V[v], v, Uin[v], v, Vin[v]);

}



double *U_d, *V_d, *Uin_d, *Vin_d;

CUDA_SAFE_CALL(cudaMalloc((void**)&U_d, sizeof(double) * sizez));

CUDA_SAFE_CALL(cudaMalloc((void**)&V_d, sizeof(double) * sizez));

CUDA_SAFE_CALL(cudaMalloc((void**)&Uin_d, sizeof(double) * sizez));

CUDA_SAFE_CALL(cudaMalloc((void**)&Vin_d, sizeof(double) * sizez));



// Check for any CUDA errors

checkCUDAError("cudaMalloc calls");



cudaMemcpy(U_d, U, sizeof(double) * sizez, cudaMemcpyHostToDevice);

cudaMemcpy(V_d, V, sizeof(double) * sizez, cudaMemcpyHostToDevice);

cudaMemcpy(Uin_d, Uin, sizeof(double) * sizez, cudaMemcpyHostToDevice);

cudaMemcpy(Vin_d, Vin, sizeof(double) * sizez, cudaMemcpyHostToDevice);



// Check for any CUDA errors

checkCUDAError("cudaMemcpy calls");



// Do calculation on device:  

int block_size;

int n_blocks = 30;

block_size = sizez/n_blocks + (sizez%n_blocks == 0? 0:1);

// int n_blocks = sizez/block_size + (sizez%block_size == 0 ? 0:1);

printf("n_blocks = %d\t block_size = %d\n", n_blocks, block_size); 

alloc_array <<< n_blocks, block_size >>> (U_d, sizez);



// Copy U_d to host

cudaMemcpy(U, U_d, sizeof(double) * sizez, cudaMemcpyDeviceToHost);

// Check values on host

for(v = 0; v < sizez; ++v)

{

	printf("U[%d] = %f, V[%d] = %f, Uin[%d] = %f, Vin[%d] = %f \n", v, U[v], v, V[v], v, Uin[v], v, Vin[v]);

}



cudaFree(U_d);

cudaFree(Uin_d);

cudaFree(V_d);

cudaFree(Vin_d);

cudaFree(U);

cudaFree(V); 

cudaFree(Uin); 

cudaFree(Vin);

return 0;

}

[/codebox]

MisterAnderson42 · March 1, 2009, 2:14am

Are you compiling with the option -arch sm_13?

If you are not, your use of doubles will result in undefined behavior.

vivekv80 · March 1, 2009, 3:13pm

Thank you for letting me know about this option. This fixed the issues. I am surprised that this option is not mentioned in the Programming guide or the reference manual.

Thanks External Media

Topic		Replies	Views
cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit CUDA Programming and Performance	7	7508	June 28, 2011
cudaMemcpy seg fault Segmentation fault copying array CUDA Programming and Performance	2	7487	July 22, 2011
Allocating array pointers segfaults What am I doing wrong? CUDA Programming and Performance	6	1790	April 21, 2009
cudaMemcpy seg fault Segmentation fault copying array CUDA Programming and Performance	3	2020	July 25, 2011
CUDA array host to device CUDA Programming and Performance	11	6030	February 19, 2008
Simple Malloc on host CUDA Programming and Performance	4	1275	October 22, 2009
2d array testing in very simple code using CUDA CUDA Programming and Performance	29	30459	November 15, 2010
Copying 2D array from host to device CUDA Programming and Performance	7	7289	July 27, 2010
Problems with creating an array of Cuda pointers CUDA Programming and Performance	7	13665	April 20, 2009
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12193	July 1, 2009

Using arrays-what's the correct way ?

Related topics