And try one of the memcpy 3d methods or so later on…
Just guessing what could be wrong… could also be memory corrupt of graphics card ?
Try other cuda software/kernels and see if they have problems too… otherwise problem is probably with your code still…
Soon I will be able to run kernels from my own api calls (in a different language).
Perhaps later I will try to run your kernel from my own api implementation.
I am too scared to run anybodies kernel in visual studio… too scared that it might fok something up ! External Image =D
But running ptx code should be a bit more safe ? External Image
Then again… ptx is still a script and gets compiled by the nvcuda.dll ptx compiler or whatever… so if there are bugs in the compiler those could be exploited to do nasty stuff.
I think there might also be “binary versions” which are precompiled, less portable but perhaps more safe… they can be loaded into device memory… like an image ?!?
How it exactly work I don’t know yet… I would suspect there is still some string searching in their for the kernel entry point… or perhaps that’s provided via binary too.
And try one of the memcpy 3d methods or so later on…
Just guessing what could be wrong… could also be memory corrupt of graphics card ?
Try other cuda software/kernels and see if they have problems too… otherwise problem is probably with your code still…
Soon I will be able to run kernels from my own api calls (in a different language).
Perhaps later I will try to run your kernel from my own api implementation.
I am too scared to run anybodies kernel in visual studio… too scared that it might fok something up ! External Image =D
But running ptx code should be a bit more safe ? External Image
Then again… ptx is still a script and gets compiled by the nvcuda.dll ptx compiler or whatever… so if there are bugs in the compiler those could be exploited to do nasty stuff.
I think there might also be “binary versions” which are precompiled, less portable but perhaps more safe… they can be loaded into device memory… like an image ?!?
How it exactly work I don’t know yet… I would suspect there is still some string searching in their for the kernel entry point… or perhaps that’s provided via binary too.
[quote name=‘qiminglu’ date=‘15 June 2011 - 09:21 PM’ timestamp=‘1308190913’ post=‘1252378’]
__global__ void
kernel_verify( int * b , double * c , int * counter )
{
int i = (blockIdx.x*BLOCKSIZE*BLOCKSIZE) + (blockIdx.y*BLOCKSIZE) + threadIdx.x;
b[i] = 1; //__syncthreads();
c[i] = 2.0; //__syncthreads();
int iix = b[i];
double oox = c[i];
if( iix!=1 ) atomicAdd(counter, 1);
if( fabs(oox-2.0)>0.001 ) atomicAdd(counter+1, 1);
}
One more finding, if argument c is of (int *) type, or (float *) type, there are no corruptions in either arrays. So appears this misbehave occurs when the kernel is writing to a double type pointee and int/float/or other 4 byte type pointee in an interleaved way.
By the way, when the corruption happens, the integer value read back is
0x 4000 0000 0000 0000
which is exactly the double value of
2.0
I’m wondering if someone could possibly repeat this and help me make sure it’s not a hiccup of my hardware here?
[quote name=‘qiminglu’ date=‘15 June 2011 - 09:21 PM’ timestamp=‘1308190913’ post=‘1252378’]
__global__ void
kernel_verify( int * b , double * c , int * counter )
{
int i = (blockIdx.x*BLOCKSIZE*BLOCKSIZE) + (blockIdx.y*BLOCKSIZE) + threadIdx.x;
b[i] = 1; //__syncthreads();
c[i] = 2.0; //__syncthreads();
int iix = b[i];
double oox = c[i];
if( iix!=1 ) atomicAdd(counter, 1);
if( fabs(oox-2.0)>0.001 ) atomicAdd(counter+1, 1);
}
One more finding, if argument c is of (int *) type, or (float *) type, there are no corruptions in either arrays. So appears this misbehave occurs when the kernel is writing to a double type pointee and int/float/or other 4 byte type pointee in an interleaved way.
By the way, when the corruption happens, the integer value read back is
0x 4000 0000 0000 0000
which is exactly the double value of
2.0
I’m wondering if someone could possibly repeat this and help me make sure it’s not a hiccup of my hardware here?
External Image Exactly! I bet it was some alignment issue here. Thanks for pointing that out and I’m gonna give it a try first thing tomorrow External Image
External Image Exactly! I bet it was some alignment issue here. Thanks for pointing that out and I’m gonna give it a try first thing tomorrow External Image