Newbie question about data transfer

ted84 · July 24, 2008, 6:08pm

Hi,

I have a question about the data transfer. Please help

I am passing a complex array from a fortran program to GPU
call processfun(bindata)

cuComplex *d_data;

extern “C” void processfun_(cuComplex data)
{
printf(“\n %f %f”,data[0].x,data[0].y); // I am only displaying the 1st element
CUDA_SAFE_CALL(cudaMalloc((void*) &d_data, sizeof(cuComplex)N));
CUDA_SAFE_CALL(cudaMemcpy(d_data, data, Nsizeof(cuComplex), cudaMemcpyHostToDevice));

printf("\n %f %f",d_data[0].x,d_data[0].y); // Gives a segmentation fault error

for(int i=0;i<N;i++)
{
	d_data[i].x = d_data[i].x * rand();
	d_data[i].y = d_data[i].y * rand();
}

CUDA_SAFE_CALL(cudaMemcpy(data, d_data, N*sizeof(cuComplex), cudaMemcpyDeviceToHost));

}

Q. Why do I get the segmentation fault? Cant I access the device data directly?

Thanks

tmurray · July 24, 2008, 7:21pm

You can’t dereference a device pointer on the CPU side; it’s only there to be used from CUDA functions.

ted84 · July 24, 2008, 7:42pm

Thanks for the reply. I would really appreciate if you could help me do this simple thing. ( What are the modifications then I need to make in the above code ? This would help me understand the way it works.

Say I want to pass a complex array of 3 elements 1+2i, 2+3i and 3+4i. On the GPU, I want to multiply them by 2 and get the result back on CPU

extern “C” void processfun_(cuComplex data)
{
CUDA_SAFE_CALL(cudaMalloc((void*) &d_data, sizeof(cuComplex)N));
CUDA_SAFE_CALL(cudaMemcpy(d_data, data, Nsizeof(cuComplex), cudaMemcpyHostToDevice));

for(int i=0;i<N;i++)
{
d_data[i].x = d_data[i].x * 2;
d_data[i].y = d_data[i].y * 2;
}

CUDA_SAFE_CALL(cudaMemcpy(data, d_data, N*sizeof(cuComplex), cudaMemcpyDeviceToHost));
}

Once again, thanks a lot.

BeachHut · July 25, 2008, 9:01am

I don’t understand your program. You are using host functions (cudaMemcpy, etc.), so I am presuming that you are showing a function that runs on the host, but then you seem to be operating on device memory in the same function “d_data[i].x = d_data[i].x * 2;”, etc.

Unless I have misunderstood your post, what you need to do is move the lines that operate on the device memory into a separate global function (‘kernel’), the call to that function would occur in the place where you currently have the lines operating on the device memory.

_Big_Mac · July 25, 2008, 10:16am

ted84,

You’ll need a kernel. It can be used instead of the “for” loop and do all the “iterations” concurrently.

Kernels (and device functions called by kernels) are the only places in the code where you can read and modify data allocated on device. Also, you cannot read/modify data residing in host memory from within kernels (so it goes both ways).

An example code could look like this:

//a kernel == a function declared as __global__ 

__global__ void multiply(cuComplex *d_data) 

{

int i = threadIdx.x;

d_data[i].x = d_data[i].x * 2;

d_data[i].y = d_data[i].y * 2;

}

Threads’ threadIdx will span from 0 to whatever number of threads per block you call*.

The host part:

extern "C" void processfun_(cuComplex *data)

{

CUDA_SAFE_CALL(cudaMalloc((void**) &d_data, sizeof(cuComplex)*N));

CUDA_SAFE_CALL(cudaMemcpy(d_data, data, N*sizeof(cuComplex), cudaMemcpyHostToDevice));

//specyfing launch parameters

dim3 gridSize(1,1,1); //one block only

dim3 blockSize(N,1,1); //N threads in that block

//launching

multiply<<<gridSize,blockSize>>>(d_data);

CUDA_SAFE_CALL(cudaMemcpy(data, d_data, N*sizeof(cuComplex), cudaMemcpyDeviceToHost));

}

assuming N is smaller than 512, which is the maximum amount of threads for a single block. If it’s bigger, you’ll need to use more blocks.

Topic		Replies	Views
GPU Transfer problems GPU won't correctly read data out from Device to Host CUDA Programming and Performance	15	2633	August 2, 2010
Calling CUDA C from fortran CUDA Programming and Performance	4	877	December 4, 2021
Copying 2D array from host to device CUDA Programming and Performance	7	7220	July 27, 2010
How to pass a single number to device basic programming CUDA Programming and Performance	5	12565	November 30, 2010
Kernel requiring large number of parameters CUDA Programming and Performance	14	8610	September 5, 2008
MultiGPUs newbie question Data transformation problem CUDA Programming and Performance	12	5152	March 18, 2008
Help with cuda 2d array CUDA Programming and Performance	6	7447	September 29, 2014
Passing structures into CUDA kernels CUDA Programming and Performance	9	20301	November 19, 2020
Cant modify data on the GPU CUDA Programming and Performance	16	10241	December 20, 2008
Passing an array of structure to kernel CUDA Programming and Performance kernel	6	2258	April 27, 2020

Newbie question about data transfer

Related topics