Kernel doesn't change value of struct member

My problem is simple to explain. I have a structure with some members of different types.

Kernel function receive array of struct as parameter. When launching kernel function I try to change some member(for example cellid) of the struct.

But after I copy that array to host I realize that nothing has been changed.

What the problem may be there?


__global__ void mainCalcCuda(struct Particle* particles, int framenum) {

	int idx = blockIdx.x;

	particles[idx].cellid = 12; // there is checking correct behaviour of a kernel: Diman



extern "C"

struct Particle {

	Vector	defposition;		

	inline 	Particle() { neighbours = new Neighbour[128]; }

	Material* material;

	int 	localId;

	int	globalId;

	int	objId;	


	bool	isNull;	

	bool 	received;


	int	cellx,celly,cellz;

	int	cellid;		// this parameter I attempt to change in vain


	int	num_of_neighbours;  	


	Neighbour* neighbours;		

	Vector position;

	Vector velocity;

        Vector acceleration;

        double mass;

        double energy;

        double pressure;

        double density;

        Tensor2Gen tDeformation;

        Tensor2Gen tStressDev;

        Tensor2Gen tStress;

        double deformWork;

        double crack;

        double state;	

	double wn;

	double c;

	double rc;

	double h;

	double maxMu;

	double ro0;

	double prevEnergy;

	double prevCrack;

	double prevPressure;

	double prevDeformWork;

	double prevDensity;

	double prevState;

	Vector prevVelocity;

	Vector prevPosition;

	Tensor2Gen prevTStressDev;	


	void init(int gId, int lid, int oId, Vector pos, Vector vel, double ah, bool mnull);


	void resetNeighbours();

	void resetStatus();

	void resetPosition();


	void newNeighbour(Particle* p);	

	void setMaterial(Material* mat);


host main:


Particle  *device, *host;

		int N = ps->getPartNum();

		size_t size = N * sizeof(Particle);

		cudaMalloc((void **) &device, size);

		host = (Particle*) malloc(size);		

		cudaMemcpy(device, ps->particles, size, cudaMemcpyHostToDevice);

		mainCalcCuda <<< N, 1 >>> (device, framenum);

		cudaMemcpy(host, device, size, cudaMemcpyDeviceToHost);

		for (int i=0; i<N; i++)


			cout << host->cellid;

			cout << "\n";



I found out that it doesn’t work even with simple float:

__global__ void fkernel (float *p)


	int idx = blockIdx.x;

	p[idx] = p[idx]+14;

float *ff, *fd; ff = (float*)malloc(3*sizeof(float));

cudaMalloc((void**)&fd, 3*sizeof(float));							

fkernel<<<3, 1>>> (fd);                                

cudaMemcpy(ff, fd, 3*sizeof(float), cudaMemcpyDeviceToHost);			

for (int i=0; i<3;i++) {cout << ff[i]<< "\n";}





fkernel just doesn’t do anything.

When there is a problem, CUDA functions do nothing and return a value which is != cudaSuccess. cudaGetErrorString() will convert the return value to a string you can print to see what is wrong.

It returns “invalid device function” and nothing more.

I followed one’s advice to change “CUDA -> Output -> Intern mode” to “Real” from “None” and now it works very well. thanks :thanks:

But I have another project where this option is set to None but kernel works without errors.

What “Intern mode” actually is?