Writing to global memory failing at runtime

I believe i have an issue writing to global memory. Although the runtime gives me a error for when i read the memory

cudaSafeCall() Runtime API error in file <i:/Nvidia/CUDA/NVIDIA GPU Computing SDK/C/src/nrtc/nrtc.cu>, line 130 : unknown error.


cutilSafeCall(cudaMalloc((void**)&dLocation, sizeof(float3) * numparts));

cutilSafeCall(cudaMemcpy(dLocation, Location, sizeof(float3) * numparts, cudaMemcpyHostToDevice));

cutilSafeCall(cudaMalloc((void**)&dVelocity, sizeof(float3) * numparts));

cutilSafeCall(cudaMemcpy(dVelocity, Velocity, sizeof(float3) * numparts, cudaMemcpyHostToDevice));

cutilSafeCall(cudaMalloc((void**)&dForce, sizeof(float3) * numparts));

cutilSafeCall(cudaMemcpy(dForce, Force, sizeof(float3) * numparts, cudaMemcpyHostToDevice));

particleCalculations<<<NUM_BLOCKS, NUM_THREADS>>>(dLocation,dVelocity, dForce, dGravity, dMass, dTimeP, dnumparts, dpdens, ddebug);


cutilSafeCall(cudaMemcpy(debug, ddebug, sizeof(float) * NUM_THREADS*NUM_BLOCKS, cudaMemcpyDeviceToHost)); //Runtime API error

cutilSafeCall(cudaMemcpy(Location, dLocation, sizeof(float3) * numparts, cudaMemcpyDeviceToHost)); //Runtime API error WITH ABOVE LINE COMMENTED OUT

cutilSafeCall(cudaMemcpy(Velocity, dVelocity, sizeof(float3) * numparts, cudaMemcpyDeviceToHost)); //Runtime API error WITH ABOVE LINE COMMENTED OUT[/codebox]

I assume i’m doing nothing wrong here, even though if i comment all the lines out it works.

it also works if the device doesn’t change the memory so i assume thats where my problem lies.

[codebox]global static void particleCalculations(float3* gLocation,

float3* gVelocity,

float3* gForce,

const float* pGravity,

const float* pMass,

const float* pTimeP,

const int* pnumparts,

const int* ppdens, float* debug)


float3 Location, Velocity, Force;

const int tid = threadIdx.x + NUM_THREADS*blockIdx.x;

const int numparts = *pnumparts;

const int pdens = *ppdens;

const float Gravity = *pGravity;

const float Mass = *pMass;

const float TimeP = *pTimeP;

float d, TForce;

float ax, ay, az, at;

int icount = tid;

int pcount=0;


for( int icount = threadIdx.x + NUM_THREADS*blockIdx.x; icount<numparts; icount = icount+NUM_THREADS*NUM_BLOCKS)



	Location = gLocation[icount];

	Velocity = gVelocity[icount];

	Location = gForce[icount];




it continues for quite some time. then to write it back into the arrays (gX)

[codebox] Velocity.x = Velocity.x+Force.x/Mass*TimeP;

	Velocity.y = Velocity.y+Force.y/Mass*TimeP;

	Velocity.z = Velocity.z+Force.z/Mass*TimeP;


	gLocation[icount] = Location;

	gVelocity[icount] = Velocity;

	gForce[icount] = Force;

	icount = icount+NUM_THREADS*NUM_BLOCKS;


My first assumption was that the device could not handle reading and writting from a global pointer at the same time on multiple threads. so i renamed and added per thread arrays.

I’m fairly stuck and every time i try something i have to reset my graphics card due to corrupt memory, it’s not pretty


You should check for errors after your kernel…

what do you mean exactly? do you mean after the cudaMemcpy?

thanks for he reply :)


Thanks for the interest & help. apparently you can’t mix the c++ new operator with cudaMalloc or else bad unknown things happen. you have to use malloc() :(