Kernel doesn't change value of struct member

My problem is simple to explain. I have a structure with some members of different types.

Kernel function receive array of struct as parameter. When launching kernel function I try to change some member(for example cellid) of the struct.

But after I copy that array to host I realize that nothing has been changed.

What the problem may be there?

Kernel:

__global__ void mainCalcCuda(struct Particle* particles, int framenum) {

	int idx = blockIdx.x;

	particles[idx].cellid = 12; // there is checking correct behaviour of a kernel: Diman

}

Structure:

extern "C"

struct Particle {

	Vector	defposition;		

	inline 	Particle() { neighbours = new Neighbour[128]; }

	Material* material;

	int 	localId;

	int	globalId;

	int	objId;	

	

	bool	isNull;	

	bool 	received;

		

	int	cellx,celly,cellz;

	int	cellid;		// this parameter I attempt to change in vain

	

	int	num_of_neighbours;  	

	

	Neighbour* neighbours;		

	Vector position;

	Vector velocity;

        Vector acceleration;

        double mass;

        double energy;

        double pressure;

        double density;

        Tensor2Gen tDeformation;

        Tensor2Gen tStressDev;

        Tensor2Gen tStress;

        double deformWork;

        double crack;

        double state;	

	double wn;

	double c;

	double rc;

	double h;

	double maxMu;

	double ro0;

	double prevEnergy;

	double prevCrack;

	double prevPressure;

	double prevDeformWork;

	double prevDensity;

	double prevState;

	Vector prevVelocity;

	Vector prevPosition;

	Tensor2Gen prevTStressDev;	

	

	void init(int gId, int lid, int oId, Vector pos, Vector vel, double ah, bool mnull);

	

	void resetNeighbours();

	void resetStatus();

	void resetPosition();

	

	void newNeighbour(Particle* p);	

	void setMaterial(Material* mat);

};

host main:

...

Particle  *device, *host;

		int N = ps->getPartNum();

		size_t size = N * sizeof(Particle);

		cudaMalloc((void **) &device, size);

		host = (Particle*) malloc(size);		

		cudaMemcpy(device, ps->particles, size, cudaMemcpyHostToDevice);

		mainCalcCuda <<< N, 1 >>> (device, framenum);

		cudaMemcpy(host, device, size, cudaMemcpyDeviceToHost);

		for (int i=0; i<N; i++)

		{

			cout << host->cellid;

			cout << "\n";

			host++;

		}

I found out that it doesn’t work even with simple float:

__global__ void fkernel (float *p)

{

	int idx = blockIdx.x;

	p[idx] = p[idx]+14;

}
float *ff, *fd; ff = (float*)malloc(3*sizeof(float));

cudaMalloc((void**)&fd, 3*sizeof(float));							

fkernel<<<3, 1>>> (fd);                                

cudaMemcpy(ff, fd, 3*sizeof(float), cudaMemcpyDeviceToHost);			

for (int i=0; i<3;i++) {cout << ff[i]<< "\n";}

Output:

0

0

0

fkernel just doesn’t do anything.

When there is a problem, CUDA functions do nothing and return a value which is != cudaSuccess. cudaGetErrorString() will convert the return value to a string you can print to see what is wrong.

It returns “invalid device function” and nothing more.

I followed one’s advice to change “CUDA -> Output -> Intern mode” to “Real” from “None” and now it works very well. thanks :thanks:

But I have another project where this option is set to None but kernel works without errors.

What “Intern mode” actually is?

solved