cutilSafeCall() Runtime API error: Invalid Argument

ceeely · September 13, 2016, 8:57am

I’ve been pulling my hair over the cause of this error in a gmres cuda coding:

float *reduce_d ,*reduce;
	cutilSafeCall( cudaHostAlloc( (void**)&reduce,  512 *sizeof(float) ,cudaHostAllocMapped) ) ;
	for( int i=0; i<512; ++i ) reduce[i] = 0.0 ;
	cutilSafeCall( cudaHostGetDevicePointer( (void**) &reduce_d, (void*)reduce, 0 )  ) ;

	float *r0_d ;// r0=b-Ax
	cutilSafeCall( cudaMalloc( (void**)&r0_d , vecSize * sizeof(float) ) ) ;
	cutilSafeCall( cudaMemcpyAsync( r0_d+vecSize-2048, val_d+nzSize-2048, 2048 * sizeof(float) , cudaMemcpyDeviceToDevice, 0 ));

	float *w_d ;
	cutilSafeCall( cudaMalloc( (void**)&w_d , vecSize * sizeof(float) ) ) ;
	cutilSafeCall( cudaMemcpyAsync( w_d+vecSize-2048, val_d+nzSize-2048, 2048 * sizeof(float) , cudaMemcpyDeviceToDevice, 0 ));
	
	float *v_d;
	cutilSafeCall( cudaMalloc( (void**)&v_d ,( m + 1 ) * vecSize * sizeof(float) ) ) ;
	for( int i = 0 ; i < m + 1; ++i ){
		cutilSafeCall( cudaMemcpyAsync( v_d+vecSize*i-2048, val_d+nzSize-2048, 2048 * sizeof(float) , cudaMemcpyDeviceToDevice, 0 ));
	}

The error arises from the for loop at the last line of the above code… Would anyone please tell me where have I went wrong?

cheinger · September 13, 2016, 1:59pm

It would error if:

v_d+vecSize*i-2048 < 0

Or:

val_d+nzSize-2048 < 0

Also maybe try to simplify the problem by removing the cudaMempyAsync and using a cudaMempy as you can guarantee it will be in sync with the host

Robert_Crovella · September 13, 2016, 2:45pm

Actually it will error if

vecSize*i-2048 < 0

This means for i=0, you will get an error.

If v_d is a properly allocated device pointer, v_d - 2048 almost certainly is not.

ceeely · September 14, 2016, 1:20am

Thanks, @cheinger , @txbob. Will try and update.

ceeely · September 16, 2016, 5:41am

Hi @cheinger and @txbob: Thank you for the tip, your suggestion solves the problem.

On a separate issue, is it normal is my speed up is only 3 times faster then using the CPU only?

Thanks!

LongY · September 16, 2016, 4:20pm

As long as your GPU code is correct meaning that your CPU and GPU code have the same results. It is also a sanity check strategy to make sure GPU code works well.

There are many factors determining the speedup between GPU and CPU. The speedup is defined as CPU_timing/GPU_timing. These factors include GPU and CPU models, the application you are working on (memory bound or compute bound), etc.

This thread has some discussion on the topic of CPU and GPU speed comparison.
[url]https://devtalk.nvidia.com/default/topic/953975/sequential-code-is-faster-than-parallel-how-is-it-possible-/[/url]

Topic		Replies	Views
cudaMemcpyAsync in GPU 1 CUDA Programming and Performance	0	1388	November 24, 2011
"Invalid argument" error with Streams CUDA Programming and Performance	2	10662	December 3, 2010
Runtime API CUDA Programming and Performance	8	10188	May 21, 2009
cudaMemcpy3D invalid argument CUDA Programming and Performance	0	839	June 26, 2009
cudaHostAlloc --> invalid argument it works with fermi, not with 9800gtx CUDA Programming and Performance	5	6608	November 11, 2011
cudaMemcpy3D invalid argument CUDA Programming and Performance	5	14398	August 2, 2010
Multiple GPU memory address problem help CUDA Programming and Performance	6	7775	November 17, 2009
cudaSafeCall() Runtime API error 33: invalid resource handle? CUDA Programming and Performance	7	11240	June 20, 2011
Writing to global memory failing at runtime CUDA Programming and Performance	4	3804	November 15, 2009
cudaArray invalid argument find the fault (please) CUDA Programming and Performance	3	10445	January 14, 2010

cutilSafeCall() Runtime API error: Invalid Argument

Related topics