Porting from CUDA to OpenCL: Pointer problem program terminates with unhandled exception

Hi.

I’m porting a program from CUDA to OpenCL. They store some data in a tree structure and from a given node in the tree, they want to access some of the information in that node.

EDIT: Now I got it to work on an nvidia GPU without changing this code. However it still has the same error on a CPU (via the AMD implementation)

EDIT: Problem solved

The problem was that on the cpu version the buffer was not aligned the right way. I had to do this manually.

On the GPU however the pointer to the start of the buffer was automatically aligned.

CUDA:

[codebox]int* node; //the node is just an int pointer to global memory[/codebox]

the function to access the information

[codebox]device void lookupInfo(int* node)

{

int* pageHeader = (int*)( (int) node  &  -PageBytes );   //PageBytes is a constant int value

int* blockInfo     = pageHeader + *pageHeader;

int* blockStart    = blockInfo + blockInfo[ BlockInfo_BlockPtr ];   //BlockInfo_BlockPtr is a constant int value

//the following three lines are similar to the ones above, so I ll leave the details out

int* attachInfos   = ...

int* attachInfo    = ... 

int* attachData    = ...

unsigned long long* dxtBlock = (unsigned long long*) ( attachData + ( (node - blockStart) >> 2) * 6);

U64 colorBlock = dxtBlock[0];

// after that the function uses the variable colorBlock

...

}[/codebox]

OPENCL:

[codebox]__global int* node; //the node is just an int pointer to global memory[/codebox]

[codebox]void lookupInfo(__global int* node)

{

__global int* pageHeader = (__global int*)( (int) node  &  -PageBytes );   //PageBytes is a constant int value

__global int* blockInfo     = pageHeader + *pageHeader;

__global int* blockStart    = blockInfo + blockInfo[ BlockInfo_BlockPtr ];   //BlockInfo_BlockPtr is a constant int value

//the following three lines are similar to the ones above, so I ll leave the details out

__global int* attachInfos   = ...

__global int* attachInfo    = ... 

__global int* attachData    = ...

__global unsigned long* dxtBlock = (__global unsigned long*) ( attachData + ( (node - blockStart) >> 2) * 6);

unsigned long colorBlock = dxtBlock[0];

// after that the function uses the variable colorBlock

...

}[/codebox]

the CUDA version works as it should.

the OpenCL version (running on the same data) terminates with an unhandled exception.

If I manually set colorBlock as a random unsigned long it works and does not terminate with the exception. It seems that somehow somewhere in those first seven lines something goes wrong and unsigned long colorBlock is not assigned a valid unsigned long.

I really hope someone knows what’s wrong here, because I have no clue.

Thanks for reading.