Problems with cudaMemcpy

stevewilson · February 5, 2013, 10:04am

Hello,

I am parallelizing a self-written PSO Code. That’s a code fragment which
unfortunately does not work.

C_struct_Particle * C_struct_Swarm_optimize(C_struct_Swarm * s) {
	C_struct_Swarm *gpu__s;

	float *f = (float *)malloc(sizeof(f));
	float *gpu__f;

	static unsigned int gpuBytes = sizeof(C_struct_Particle *)
			+ ((1024 * 1024) * sizeof(C_struct_Particle *));
	

	static unsigned int gpuBytes_f = sizeof(*f);

	CUDA_SAFE_CALL(cudaMalloc(((void * *) (&gpu__s)), gpuBytes));

	printf("After cuda Malloc calculated sizeof = %ld\n", gpuBytes);
	printf("After cuda Malloc calculated sizeof(s) =  %ld\n", sizeof(s));
	printf("After cuda Malloc calculated sizeof(*s) =  %ld\n", sizeof(*s));

	CUDA_SAFE_CALL(cudaMalloc(((void * *) (&gpu__f)), gpuBytes_f));

	for (j = 0; j < 20; 20; j++) {

		int err = cudaMemcpy(gpu__s, s, gpuBytes, cudaMemcpyHostToDevice);
		printf("cudaMemcpy err code = %d\n", err);

		err = cudaMemcpy(gpu__f, f, gpuBytes_f, cudaMemcpyHostToDevice);
		printf("cudaMemcpy for FLOAT +++ err code = %d\n", err);

		C_struct_Swarm_optimize_kernel0<<<dimGrid0, dimBlock0, 0, 0>>>();

	}
	return 0;
}

In the first and second(!) iteration both cuda Memcopies work with error code 0. In the next 18 iterations they return error code 4. If I comment out the kernel call "C_struct_Swarm_optimize_kernel0<<<dimGrid0, dimBlock0, 0, 0>>>();", the cuda memcopies work in every(!) iteration for both variables.

I know that this code is incomplete, but I just don’t understand why the errors appear in cudaMemcpy.

pasoleatis · February 5, 2013, 3:06pm

Hello,

I do not think the problem is with cudamemcpy. It is just that the error is shown at the cudamemcpy line. My guess is that the problem is in the kernel and the error is shown at the next cudamemcpy call. I am not sure if it is problem with this call C_struct_Swarm_optimize_kernel0<<<dimGrid0, dimBlock0, 0, 0>>>(), but I would leave out the “,0,0” part if there are no streams and no shared memory used. Of course it might just be that you are using too many threads per block or too many blocks, or the problem is in the kernel.

stevewilson · February 5, 2013, 4:04pm

Thanks for answering.

I found the problem: There were some memory management faults within the kernel. I called the normal malloc() in a function used in the kernel. Of course this can’t work because the kernel runs on the GPU.

Regards
sw

Topic		Replies	Views
Cudamemcpy doesn't seem to work CUDA Programming and Performance	2	7163	July 21, 2010
CudaMemcpy fails Memcpy fails CUDA Programming and Performance	2	2528	October 27, 2011
strange problem accessing device memory cudaMalloc and cudaMemcpy CUDA Programming and Performance	0	2298	April 2, 2010
Problems with Memcpy CUDA Programming and Performance	2	646	November 7, 2011
kernel only executes successfully once, then cudaMemcpy segfaults CUDA Programming and Performance	2	3179	March 31, 2009
cudaMemcpy CUDA Programming and Performance	3	8428	April 8, 2009
WEIRD cudaMemcpy error CUDA Programming and Performance	2	3799	November 15, 2011
Unknown error at cudaMemcpy CUDA Programming and Performance	0	1947	December 14, 2008
n00b error with cudaMemcpy CUDA Programming and Performance	4	997	June 30, 2010
cudaMemcpy Strange behaviour CUDA Programming and Performance	2	1429	April 8, 2010

Problems with cudaMemcpy

Related topics