Unknown error

I’ve already posted something related to problem I faced with: http://forums.nvidia.com/index.php?showtopic=196721

The first problem was that when I passing array of structure

struct ppp { float a; float b; float c; int d;};

kernel didn’t change anything. I set “CUDA -> Output -> Intern mode” to “Real” from “None” and it helped.

Now I have more complicated structure:

struct Particle {

//=============================================================================

	Vector	defposition;			// default position

	inline 	Particle() { neighbours = new Neighbour[128]; }

	Material* material;

	int 	localId;

	int	globalId;

	int	objId;	

	

	bool	isNull;	

	bool 	received;

		

	int	cellx,celly,cellz;// cell params

	int	cellid;	// cell id	

	

	int	num_of_neighbours;  	// number of neighbours

	

	Neighbour* neighbours;			// particle neighbours

	//vars

	Vector position;

	Vector velocity;

        Vector acceleration;

        float mass;

        float energy;

        float pressure;

        float density;

        Tensor2Gen tDeformation;

        Tensor2Gen tStressDev;

        Tensor2Gen tStress;

        float deformWork;

        float crack;

        float state;	

	//temp vars

	float wn;

	float c;

	float rc;

	float h;

	float maxMu;

	float ro0;

	//prev var values

	float prevEnergy;

	float prevCrack;

	float prevPressure;

	float prevDeformWork;

	float prevDensity;

	float prevState;

	Vector prevVelocity;

	Vector prevPosition;

	Tensor2Gen prevTStressDev;	

	

	void init(int gId, int lid, int oId, Vector pos, Vector vel, float ah, bool mnull);

	

	void resetNeighbours();

	void resetStatus();

	void resetPosition();

	

	void newNeighbour(Particle* p);	

	void setMaterial(Material* mat);

};

Kernel(just try to change one parameter):

__global__ void mainCalcCuda(Particle* particles, int framenum) {

       int idx = blockIdx.x;

       particles[idx].position.x = 8.0; 

}

There is host side(I copy from ps->particles and after processing to host):

Particle  *device, *host;

size_t size = N * sizeof(Particle);

cudaMalloc((void **) &device, size);

host = (Particle*) malloc(size);		

cudaMemcpy(device, ps->particles, size, cudaMemcpyHostToDevice);

mainCalcCuda <<< N, 1 >>> (device, framenum);

cudaMemcpy(host, device, size, cudaMemcpyDeviceToHost);

cudaThreadSynchronize();

printf("%s \n",cudaGetErrorString(cudaGetLastError()));

for (int i=0; i<N; i++)

{

    cout << host->position.x;

    host++;

}

Output:

“Unknown error”

zeroes - nothing changed

It bewilders me.

not sure if I could help… just some general advice:

  1. Use GetLastError after every cuda call. You may get a better picture of what is going on
  2. How big is your N? It should be no larger than 65535. Also, it appears weird that you put only 1 thread/block. Perhaps multiples of 32 would better serve you?\
  3. If you wish to improve speed, you might have to break the struct and use an array for each element to help with coalesced accesses.

Thank you for reply.

  1. I use GetLastError after launching kernel-function and get “Unknown error” as I wrote. Other cuda functions succeded.

  2. My N is about 2000, it’s ok.

  3. To start I just want my struct to be processed. I don’t think of efficiency.

I figured out that there was a problem in one of the cuda-function caused “Unknown error”:

p.density = p.density + c->mass * w;
p.density = p.density + <any float number>; //  works well!

By the way in some cases c->mass and w are about 1e-005 and 1e-012 respectively

p.density = p.density + 7.29e-015 * 5.61463e-015; //  works well!

That means c or c->mass are an invalid pointer. An “unknown error” from a kernel is almost always due to trying to dereference an invalid pointer - effectively the GPU equivalent of a segmentation fault in host code.

float xx = c->mass;

float yy=5.0; yy*=xx;

p.density = p.density + yy;

First and second line are ok. In the third line ++y cause “Unknown error”.

Does it mean that dereferencing was ok?