invalid argument error

Hi all,

I have some trouble getting some kernel to work. I keep getting a invalid argument error and have no clue what the reason could be. Below is the code

uint num_total = 1033;

uint4 *temp_u4, *g_total;

temp_u4 = (uint4 *) mxMalloc(num_total*sizeof(uint4));

convert_matlab2uint4(mxGetPr(m_total), temp_u4, num_total, num_total);

CUDA_SAFE_CALL( cudaMalloc( (void **) &g_total, num_total*sizeof(uint4)));

CUDA_SAFE_CALL( cudaMemcpy( g_total, temp_u4, num_total*sizeof(uint4), cudaMemcpyHostToDevice));

mxFree(temp_u4);

float4 *g_X_arr, *g_Y_arr, *g_Z_arr;

CUDA_SAFE_CALL( cudaMalloc( (void **) &g_X_arr, num_total*sizeof(float4)));

CUDA_SAFE_CALL( cudaMalloc( (void **) &g_Y_arr, num_total*sizeof(float4)));

CUDA_SAFE_CALL( cudaMalloc( (void **) &g_Z_arr, num_total*sizeof(float4)));

generate_XYZ<<<((int) ceil(((double) num_total)/32.0)),32>>>(g_total, g_X_arr, g_Y_arr, g_Z_arr);

CUT_CHECK_ERROR("An error occured"); // This kernel happily runs

uint *special, *g_special;

special = (uint *) mxMalloc(num_total*sizeof(uint));

CUDA_SAFE_CALL( cudaMalloc( (void **) &g_special, num_total*sizeof(uint)));

uint total_dim = (uint) ceil(((float) num_total)/32.0);

uint *g_num_special;

CUDA_SAFE_CALL( cudaMalloc( (void **) &g_num_special, 1*sizeof(uint)));

cudaMemset(g_num_special, 0, 1*sizeof(uint));

fprintf(stderr, "<<<%d, %d>>>(%d, %d, %d, %d, %d, %d)\n", total_dim, 32, g_total, g_num_special, g_special, g_X_arr, g_Y_arr, g_Z_arr);

check_for_special<<<total_dim, 32>>>(g_total, g_num_special, g_special, g_X_arr, g_Y_arr, g_Z_arr);

CUT_CHECK_ERROR("An error occured"); // This is line 113

The printf generates : <<<33, 32>>>(61734912, 16805376, 61997056, 61800448, 61865984, 61931520)

And the Error check after gives me:

Cuda error: An error occured in file ‘total_kernel.cu’ in line 113 : invalid argument.

So the kernel is not even starting. All the previous posts on this subject are about memcpy3D and the FAQ only tells that the size of the arguments might be the trouble, but as far as I can see I have only 6 pointers as in/output…

If anybody can shed a light, I would be very happy!

Everything with the memory allocation and kernel call looks OK. What do the prototype/declarations of your kernel look like? Also, this line you posted is a bit strange. You’re using a double precision literal and math function, but casting num_total to float.

__global__ void check_for_special(uint4 *g_total, uint *g_num_special, uint *g_special, float4 *X_arr, float4 *Y_arr, float4 *Z_arr)

And that line is indeed not the nicest code… But the calculated number is okay ;)

  • Are all bounds checks inside the kernel OK? (The stuff that checks if any given thread is within num_total positions of each array)

  • What happens if you allocate the next multiple of 16 or 32 items in your arrays? (E.g., instead of allocating 1033 (num_special) items, you allocate 1056 items)

What does this line do?

convert_matlab2uint4(mxGetPr(m_total), temp_u4, num_total, num_total);

In particular, why is num_total listed twice?

Yeah all bounds are checked (num_total is a constant), and I don’t think the kernel is even starting, I don’t get an ULF.

I will try with padding, but am afraid it will not help much. I have an earlier kernel where some of the inputs (g_X_arr, g_Y_arr and g_Z_arr) are generated from g_total, and that kernel runs happily…

I will adjust the original post to show that kernel.

Maybe it is time to start to try the debugger :)

Oh, this is a simple macro to convert matlab (doubles) into uints. The second num_total might be bigger to pad the uint array with zero values);

Trying the same on 2.1 instead of 2.2 beta gives the same error. Even with an empty kernel!
So I guess one of the pointers is not a valid pointer (anymore). I’ll dig deeper.

okay, this turned out to be a cudaEventRecord mistake of mine. I had 3 macros do to timing:

INIT_TIMING

START_TIMING
kernel
FINISH_TIMING

I wrote INIT_TIMING instead of START_TIMING… :">

Hey!

Where was this mistake in your code? Inside the kernel or in the C code? I am getting a similar error message, and my kernel configuration is (1,1,1), (16,12,1) so its not a bad configuration … Any tips on what should I be looking for in order to debug? Thanks!

Okay, so I found my problem, but it doesn’t make any sense…

I’m accelerating matlab code with Cuda, so I am also trying Jacket. The second I use jacket (such as ginfo) my handwritten cuda code goes haywire, now it’s throwing “invalid resource handle”.

I have also tried jacket very briefly due to busy times at work. I also got a invalid resource handle, and other messages. I will still have to try to find some time to dig in deeper and post a detailed bug report on the jacket website.

My problem was in my host code (I did not run CudaEventRecord(start), but did a timelapsed(start,end) afterwards (don’t know the exact function-name))

I also got the message with a completely empty kernel, maybe you can try that.