Pointers in formal parameter list? I don't understand...

Smokey · November 13, 2008, 11:24pm

Hi all,

Quite simply - The following function doesn’t work, due to the fact the formal parameter list is somehow corrupt… When looking at the resulting output of dst, the values do not change at all during the function call (eg: memset-ing dst to 32 before the function call, will result in dst still containing values of 32 afterwards).

The odd thing is - if I put the ‘dst’ parameter at the start of the parameter list - it works fine, indicating that somehow the corruption starts mid-way through the list… but as you can see, I’m setting the parameters correctly (as far as I’m aware).

I’ve double and triple checked that all the values I’m passing into ‘cuSetParamX(a, b, value)’ are correct, so I’m really not sure what’s going on here… :unsure: The only thing I can think of is I’ve somehow mis-understood the size of uint4/float4/pointers - and must have magically got it write with all my other CUDa functions (inlikely…).

Sorry to dump a whole bunch of code… but without showing how I pass the parameters, this thread is somewhat useless.

This is using CUDA 2.0 on Windows XP and Vista (same results on both).

CUDA kernel:

[codebox]extern “C” global void function_name(uint4 img_data, uint4 src_roi, uint2 dst_pos, float4 value, unsigned char *src, unsigned char *dst)

{

const unsigned int global_tid = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

dst[global_tid] = 128;

}[/codebox]

Driver API Code (for setting parameters/param size):

[codebox]size_t offset(0);

// img_data

cuParamSeti(kernel, offset, …); offset += sizeof(int);

// src_roi

cuParamSeti(kernel, offset, …); offset += sizeof(int);

// dst_pos

cuParamSeti(kernel, offset, …); offset += sizeof(int);

// value

cuParamSetf(kernel, offset, …); offset += sizeof(float);

// src

cuParamSeti(kernel, offset, …); offset += __alignof(void*);

// dst

cuParamSeti(kernel, offset, …); offset += __alignof(void*);

cuParamSetSize(kernel, offset);[/codebox]

I should probably note, in the actual code I’m using - all of my CUDA functions are wrapped up in a macro that checks for errors - and nothing returns an error at any stage of my code.

Thanks in advance for any help,

Cheers.

tmurray · November 14, 2008, 12:07am

First things first–is this 32-bit or 64-bit Windows?

Smokey · November 14, 2008, 12:11am

This is 32bit windows (both XP and Vista).

For what it’s worth, I’ve tried hard-coding the pointer sizes to 4 (and in a desperate attempt, I even tried 8 - despite being on 32bit) - no change in results.

Smokey · November 16, 2008, 10:43pm

I’m not sure how or why… but I’ve managed to get the kernel running as intended.

Somehow my float4 parameter was messing everything up… I’ve since been able to change that to 4 consecutive float parameters, and the kernel works fine…

Now my question is, why isn’t a float4 - 4 floats?

Edit: After further investigation into the Programming Guide - it makes a vague reference to ‘float2’ and ‘float4’ automatically being memory-aligned to 8/16bytes respectively…

So am I right to assume when passing float2/float4 parameters via the Driver API - I should make sure I include the appropriate padding to align them appropriately?

2nd Edit: For those encountering similar problems: memory-alignment was indeed the problem - I simply had to add 8 bytes padding before my float4 parameter (to align it to 16 bytes).

alex_dubinsky · November 17, 2008, 1:04am

Very interesting. Sounds like a bug in the Driver API. Can you post a small sample code that reproduces it?

tmurray · November 17, 2008, 1:34am

did you try __alignof(float4) or whatever?

Smokey · November 17, 2008, 2:26am

I don’t think it’s a bug at all, it almost seems as if it’s intended behavior - albeit poorly documented.

I’m assuming the float2 & float4 structs, behind the scenes are __alignof(8) & __alignof(16) - that’s the most logical cause of this problem.

As for a sample case, the above sample is about as simple as it gets (you can remove the first 2 uint4’s if you want - it should still produce incorrect results) - the Driver API code is very straight forward to make up yourself.

And there’s no such thing as float4 in C/C++ (without using CUT? - or whatever the little library that comes with the SDK is called), unless you meant trying to use __alignof in CUDA, in which case you can’t prefix a formal parameter with alignment/packing specifiers.

Edit: It would appear (u)int4 also has the same alignment/padding requirements as float4 - as I’ve just encountered an identical problem in another kernel, and resolved it by simply inserting some padding to align the int4 appropriately…

It’d be nice to see this clearly documented in the CUDA 2.1 Programming Manual

Topic		Replies	Views
Parameter passing bug in CUDA 2.0 x86_64 CUDA 2.0 compiler, parameter passing CUDA Programming and Performance	8	5127	January 27, 2009
[bugreport] __alignof(CUdeviceptr) == 4, should 8 CUDA Programming and Performance	12	27201	July 5, 2010
Formal parameter space overflowed kernel launch error CUDA Programming and Performance	18	18433	May 2, 2010
float4 alignment inconsistency... CUDA Programming and Performance	3	2216	February 19, 2015
Reading from float4 corrupts memory CUDA Programming and Performance cuda	18	33	October 3, 2024
cudaMemset in 11.4: what causes it to give cudaErrorInvalidValue? CUDA Programming and Performance	13	6347	September 21, 2021
32-bit to 64-bit Issue CUDA Programming and Performance	1	2867	September 18, 2009
Alignment requirements, shared memory CUDA Programming and Performance cuda	11	344	September 2, 2024
struct member alignment inconsistent using templates CUDA Programming and Performance	3	1540	March 10, 2013
Data alignment for 32 bit reads? CUDA Programming and Performance	8	5088	March 27, 2007

Pointers in formal parameter list? I don't understand...

Related topics