Pointers in formal parameter list? I don't understand...

Hi all,

Quite simply - The following function doesn’t work, due to the fact the formal parameter list is somehow corrupt… When looking at the resulting output of dst, the values do not change at all during the function call (eg: memset-ing dst to 32 before the function call, will result in dst still containing values of 32 afterwards).

The odd thing is - if I put the ‘dst’ parameter at the start of the parameter list - it works fine, indicating that somehow the corruption starts mid-way through the list… but as you can see, I’m setting the parameters correctly (as far as I’m aware).

I’ve double and triple checked that all the values I’m passing into ‘cuSetParamX(a, b, value)’ are correct, so I’m really not sure what’s going on here… :unsure: The only thing I can think of is I’ve somehow mis-understood the size of uint4/float4/pointers - and must have magically got it write with all my other CUDa functions (inlikely…).

Sorry to dump a whole bunch of code… but without showing how I pass the parameters, this thread is somewhat useless.

This is using CUDA 2.0 on Windows XP and Vista (same results on both).

CUDA kernel:

[codebox]extern “C” global void function_name(uint4 img_data, uint4 src_roi, uint2 dst_pos, float4 value, unsigned char *src, unsigned char *dst)

{

const unsigned int global_tid = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

dst[global_tid] = 128;

}[/codebox]

Driver API Code (for setting parameters/param size):

[codebox]size_t offset(0);

// img_data

cuParamSeti(kernel, offset, …); offset += sizeof(int);

cuParamSeti(kernel, offset, …); offset += sizeof(int);

cuParamSeti(kernel, offset, …); offset += sizeof(int);

cuParamSeti(kernel, offset, …); offset += sizeof(int);

// src_roi

cuParamSeti(kernel, offset, …); offset += sizeof(int);

cuParamSeti(kernel, offset, …); offset += sizeof(int);

cuParamSeti(kernel, offset, …); offset += sizeof(int);

cuParamSeti(kernel, offset, …); offset += sizeof(int);

// dst_pos

cuParamSeti(kernel, offset, …); offset += sizeof(int);

cuParamSeti(kernel, offset, …); offset += sizeof(int);

// value

cuParamSetf(kernel, offset, …); offset += sizeof(float);

cuParamSetf(kernel, offset, …); offset += sizeof(float);

cuParamSetf(kernel, offset, …); offset += sizeof(float);

cuParamSetf(kernel, offset, …); offset += sizeof(float);

// src

cuParamSeti(kernel, offset, …); offset += __alignof(void*);

// dst

cuParamSeti(kernel, offset, …); offset += __alignof(void*);

cuParamSetSize(kernel, offset);[/codebox]

I should probably note, in the actual code I’m using - all of my CUDA functions are wrapped up in a macro that checks for errors - and nothing returns an error at any stage of my code.

Thanks in advance for any help,

Cheers.

First things first–is this 32-bit or 64-bit Windows?

This is 32bit windows (both XP and Vista).

For what it’s worth, I’ve tried hard-coding the pointer sizes to 4 (and in a desperate attempt, I even tried 8 - despite being on 32bit) - no change in results.

I’m not sure how or why… but I’ve managed to get the kernel running as intended.

Somehow my float4 parameter was messing everything up… I’ve since been able to change that to 4 consecutive float parameters, and the kernel works fine…

Now my question is, why isn’t a float4 - 4 floats?

Edit: After further investigation into the Programming Guide - it makes a vague reference to ‘float2’ and ‘float4’ automatically being memory-aligned to 8/16bytes respectively…

So am I right to assume when passing float2/float4 parameters via the Driver API - I should make sure I include the appropriate padding to align them appropriately?

2nd Edit: For those encountering similar problems: memory-alignment was indeed the problem - I simply had to add 8 bytes padding before my float4 parameter (to align it to 16 bytes).

Very interesting. Sounds like a bug in the Driver API. Can you post a small sample code that reproduces it?

did you try __alignof(float4) or whatever?

I don’t think it’s a bug at all, it almost seems as if it’s intended behavior - albeit poorly documented.

I’m assuming the float2 & float4 structs, behind the scenes are __alignof(8) & __alignof(16) - that’s the most logical cause of this problem.

As for a sample case, the above sample is about as simple as it gets (you can remove the first 2 uint4’s if you want - it should still produce incorrect results) - the Driver API code is very straight forward to make up yourself.

And there’s no such thing as float4 in C/C++ (without using CUT? - or whatever the little library that comes with the SDK is called), unless you meant trying to use __alignof in CUDA, in which case you can’t prefix a formal parameter with alignment/packing specifiers.

Edit: It would appear (u)int4 also has the same alignment/padding requirements as float4 - as I’ve just encountered an identical problem in another kernel, and resolved it by simply inserting some padding to align the int4 appropriately…

It’d be nice to see this clearly documented in the CUDA 2.1 Programming Manual