Access violation in cudaMalloc

Hi All,

When I try to run my code using the device mode I get this error:

Unhandled exception at 0x1000feed in PSO_GPU.exe: 0xC0000005: Access violation writing location 0x01000600.

This error is happening in the third line inside the main function :cudaMalloc((void **) &S_d->sdVar, SD * sizeof(SDVAR ));

I am using Visual C++ 2005 and Geforce GTX 285 (Driver Version

It seems I am missing something before calling the cudaMalloc, but I don’t know what it’s.

Your help is really appreciated.


[codebox]int main ()


Swarm  * S_d  ;

cudaMalloc((void **) &S_d, sizeof(Swarm));

//printf("Size of SDVAR= %d\n",sizeof(SDVAR));

cudaMalloc((void **) &S_d->sdVar, SD * sizeof(SDVAR ));

cudaMalloc((void **) &S_d->pVar, (N*N*NUM_TRX)*sizeof(PVAR ));

cudaMalloc((void **) &S_d->tVar, (N*N*NUM_TRX)*sizeof(TVAR ));

}[/codebox][codebox]#ifndef SWARM_HEADER_H


#include “Constants.h”

typedef struct


int * X_sd;


typedef struct


float * X_Power;


typedef struct


float * X_Theta;


typedef struct


float * penalty;


typedef struct


float * channelCapacity;


typedef struct


  SDVAR ** sdVar;

  PVAR **  pVar;

  TVAR **  tVar;

  SDVAR ** V_sdVar;

  PVAR **  V_pVar;

  TVAR **  V_tVar;


  SDVAR ** pl_sdVar;

  PVAR **  pl_pVar;

  TVAR **  pl_tVar;

  int * pg_sdVar;

  float * pg_pVar;

  float * pg_tVar;

  SDPENALTY ** sdPenalty;

  OUTTOPOLOGY ** _outputTopology;

  float * plFitness;

  int               swarmSize;

  float            bestSwarmFitness; // best ever fitness

  int trial;




You are allocating S_d on the device and then trying to access its fields in host code. This is something you can not do.

I believe this will work if you create a Swarm variable in host code and do the cudaMallocs using its fields. Then do a cudaMemcpy of the host Swarm structure to the device.

Thank you so much. I was hoping such mistakes would have been captured while running my code by the Emulator.

The problem that I have three levels of pointers in the struct and I organized them in a way to help in memory coalescing. So, I believe the best way for me will be to allocate and copy into cuda memory using flat arrays then I can use pointers to point into these arrays from my struct.

Thanks again for your help,


I generally wrap my cuda calls in CUDA_SAFE_CALL. I’m not sure if this would have caught the issue you had or not.

CUDA_SAFE_CALL(cudaMalloc((void **) &d_out, sizeof(int)*x));

No one outside of SDK developers should use CUTIL for any reason ever. Check your own errors or write your own wrappers that you know do the right thing–if you ever send code to me that relies on cutil.h, I will throw it back at you. (cutil.h changes all the time internally, it changes from SDK release to SDK release too, and its semantics are constantly evolving)

Seriously, do not use it.