NVIDIA Developer Forums

Device memory access bug?

Accelerated Computing CUDA CUDA Programming and Performance

Skybuck June 16, 2011, 4:39am 21

Perhaps the problem is an alignment problem.

Try using MemAllocPitch for 2D memory.

MemAllocPitch3D does not exist ?

And try one of the memcpy 3d methods or so later on…

Just guessing what could be wrong… could also be memory corrupt of graphics card ?

Try other cuda software/kernels and see if they have problems too… otherwise problem is probably with your code still…

Soon I will be able to run kernels from my own api calls (in a different language).

Perhaps later I will try to run your kernel from my own api implementation.

I am too scared to run anybodies kernel in visual studio… too scared that it might fok something up ! External Image =D

But running ptx code should be a bit more safe ? External Image

Then again… ptx is still a script and gets compiled by the nvcuda.dll ptx compiler or whatever… so if there are bugs in the compiler those could be exploited to do nasty stuff.

I think there might also be “binary versions” which are precompiled, less portable but perhaps more safe… they can be loaded into device memory… like an image ?!?

How it exactly work I don’t know yet… I would suspect there is still some string searching in their for the kernel entry point… or perhaps that’s provided via binary too.

Binary is interesting stuff.

Skybuck June 16, 2011, 4:39am 22

Perhaps the problem is an alignment problem.

Try using MemAllocPitch for 2D memory.

MemAllocPitch3D does not exist ?

And try one of the memcpy 3d methods or so later on…

Just guessing what could be wrong… could also be memory corrupt of graphics card ?

Try other cuda software/kernels and see if they have problems too… otherwise problem is probably with your code still…

Soon I will be able to run kernels from my own api calls (in a different language).

Perhaps later I will try to run your kernel from my own api implementation.

I am too scared to run anybodies kernel in visual studio… too scared that it might fok something up ! External Image =D

But running ptx code should be a bit more safe ? External Image

Then again… ptx is still a script and gets compiled by the nvcuda.dll ptx compiler or whatever… so if there are bugs in the compiler those could be exploited to do nasty stuff.

I think there might also be “binary versions” which are precompiled, less portable but perhaps more safe… they can be loaded into device memory… like an image ?!?

How it exactly work I don’t know yet… I would suspect there is still some string searching in their for the kernel entry point… or perhaps that’s provided via binary too.

Binary is interesting stuff.

qiminglu June 16, 2011, 4:45am 23

[quote name=‘qiminglu’ date=‘15 June 2011 - 09:21 PM’ timestamp=‘1308190913’ post=‘1252378’]

__global__ void 

  kernel_verify( int * b , double * c , int * counter )

{

  int i = (blockIdx.x*BLOCKSIZE*BLOCKSIZE) + (blockIdx.y*BLOCKSIZE) + threadIdx.x;

b[i] = 1;   //__syncthreads();

  c[i] = 2.0; //__syncthreads();

int iix = b[i];

  double oox = c[i];

if( iix!=1 )              atomicAdd(counter, 1);

  if( fabs(oox-2.0)>0.001 ) atomicAdd(counter+1, 1);

}

One more finding, if argument c is of (int *) type, or (float *) type, there are no corruptions in either arrays. So appears this misbehave occurs when the kernel is writing to a double type pointee and int/float/or other 4 byte type pointee in an interleaved way.

By the way, when the corruption happens, the integer value read back is

0x 4000 0000 0000 0000

which is exactly the double value of

2.0

I’m wondering if someone could possibly repeat this and help me make sure it’s not a hiccup of my hardware here?

qiminglu June 16, 2011, 4:45am 24

[quote name=‘qiminglu’ date=‘15 June 2011 - 09:21 PM’ timestamp=‘1308190913’ post=‘1252378’]

__global__ void 

  kernel_verify( int * b , double * c , int * counter )

{

  int i = (blockIdx.x*BLOCKSIZE*BLOCKSIZE) + (blockIdx.y*BLOCKSIZE) + threadIdx.x;

b[i] = 1;   //__syncthreads();

  c[i] = 2.0; //__syncthreads();

int iix = b[i];

  double oox = c[i];

if( iix!=1 )              atomicAdd(counter, 1);

  if( fabs(oox-2.0)>0.001 ) atomicAdd(counter+1, 1);

}

One more finding, if argument c is of (int *) type, or (float *) type, there are no corruptions in either arrays. So appears this misbehave occurs when the kernel is writing to a double type pointee and int/float/or other 4 byte type pointee in an interleaved way.

By the way, when the corruption happens, the integer value read back is

0x 4000 0000 0000 0000

which is exactly the double value of

2.0

I’m wondering if someone could possibly repeat this and help me make sure it’s not a hiccup of my hardware here?

qiminglu June 16, 2011, 4:50am 25

External Image Exactly! I bet it was some alignment issue here. Thanks for pointing that out and I’m gonna give it a try first thing tomorrow External Image

qiminglu June 16, 2011, 4:50am 26

External Image Exactly! I bet it was some alignment issue here. Thanks for pointing that out and I’m gonna give it a try first thing tomorrow External Image

Skybuck June 16, 2011, 4:54am 27

No I think you might be using the wrong copy function:

Maybe memcpy is only ment for device memory to device memory copy.

You should try some other copy functions which are ment for device to host and host to device ?

Maybe memcpy is ok… not sure… try other copy function might help.

Maybe the corruption is not real and counter is wrongly copied ?

But according to you the double ended up in the integer ? which is kinda weird External Image

Oh well I am at a lost here… but soon I might be able to try your kernel in ptx form so… time will tell External Image :)

Skybuck June 16, 2011, 4:54am 28

No I think you might be using the wrong copy function:

Maybe memcpy is only ment for device memory to device memory copy.

You should try some other copy functions which are ment for device to host and host to device ?

Maybe memcpy is ok… not sure… try other copy function might help.

Maybe the corruption is not real and counter is wrongly copied ?

But according to you the double ended up in the integer ? which is kinda weird External Image

Oh well I am at a lost here… but soon I might be able to try your kernel in ptx form so… time will tell External Image :)

Skybuck June 16, 2011, 5:27am 29

This function might be of some use to check base address of device pointers:

cuMemGetAddressRange

Driver API Call, maybe Runtime API has something similiar.

Not sure exactly what it does… but it probably returns the base of arbitrary pointers for memory allocations.

So if memory block is allocated and a pointer falls in it’s range, then the base address of the memory block would be returned…

So this could give some idea if perhaps something is wrong or misaligned.

Try subtracting pointers from each other or so…

Skybuck June 16, 2011, 5:27am 30

This function might be of some use to check base address of device pointers:

cuMemGetAddressRange

Driver API Call, maybe Runtime API has something similiar.

Not sure exactly what it does… but it probably returns the base of arbitrary pointers for memory allocations.

So if memory block is allocated and a pointer falls in it’s range, then the base address of the memory block would be returned…

So this could give some idea if perhaps something is wrong or misaligned.

Try subtracting pointers from each other or so…

Topic		Replies	Views	Activity
Problem with read access violation for large arrays in unified memory CUDA Programming and Performance	16	3232	July 16, 2018
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2918	September 10, 2010
CUBLAS + Kernels + double precision + "for loop" = strange behavior CUDA Programming and Performance	17	4445	September 25, 2009
[Beginner] Memory is reseted in the kernel CUDA Programming and Performance	5	1375	October 29, 2010
help with a weird bug in the code doesn't writes to memory CUDA Programming and Performance	2	1634	January 28, 2010
QUIT CUDA? Kernel and pinned memory gives strange results CUDA Programming and Performance	6	6776	September 22, 2011
Corruption in memory copy? CUDA Programming and Performance	3	4052	October 15, 2010
Odd error fixed by commenting unrelated line? CUDA Programming and Performance	11	8729	February 17, 2010
The kernel isn't working CUDA Programming and Performance	9	1137	January 19, 2011
Peculiar Shared Memory Behavior - NEED HELP! A test program appears to have two arrays sharing v CUDA Programming and Performance	21	2312	June 5, 2011