Switching kernel from single to double precision execution fails.

// typedef single TFloatingPointPrecision;
typedef double TFloatingPointPrecision;

struct TParticle
{
TFloatingPointPrecision mStartX;
TFloatingPointPrecision mStartY;

TFloatingPointPrecision mStopX;
TFloatingPointPrecision mStopY;

TFloatingPointPrecision mDirectionX;
TFloatingPointPrecision mDirectionY;

TFloatingPointPrecision mSpeed;	

TFloatingPointPrecision mCurrentX;
TFloatingPointPrecision mCurrentY;

int mColor;

};

if I switch this kernel from single to double precision the kernel fails ?

Any idea what could be wrong ?

All code is basically very simple.

Only thing I can think of is that the kernel is somehow running out of memory.

But with 1 GB of RAM that shouldn’t be happening…

Compiles tried with textpad and parameters:

cuda toolkit 5.5:
$File --ptx --device-debug --machine 32 -arch sm_20

cuda toolkit 4.2:
$File --ptx -G0 --machine 32 -arch sm_20 -Xptxas -v

Now luck with double precision.

Graphics Card is GT 520 with compute 2.1 support ?!?

(I can’t debug it at the moment… since vs2010 and cuda toolkit 5.5 is giving me the craps.)

If I change the structure to this, the cuModuleLoad completely fails ?!? it crashes with some invalid floating point operation exception ?! Perhaps the structure is not packed properly ?

It’s starting to seem like a cuda compiler or driver api loader issue ??? file being loaded is ptx.

typedef float TFloatingPointPrecision;

struct TParticle
{
TFloatingPointPrecision mStartX;
TFloatingPointPrecision mStartY;

TFloatingPointPrecision mStopX;
TFloatingPointPrecision mStopY;

TFloatingPointPrecision mDirectionX;
TFloatingPointPrecision mDirectionY;

TFloatingPointPrecision mSpeed;	

TFloatingPointPrecision mCurrentX;
double mCurrentY; // changed 1 field to see what effect it has ?!? crashes the load ?!

unsigned int mColor;

};

I’ll try a graphics driver update to see if that helps…

Updating drivers did not help unfortunately :( loading of kernel.ptx still crashes…

Interestingly enough the module load function not comepletely fails, even if all are floats.

Perhaps the driver api changed in 5.5 or the issue is more severe ?! Hmm…