CUDA program crashes at function call

Beteigeuze · August 6, 2010, 2:14pm

hi guys,

i have a very weird problem at the moment and i am not sure, why.

If i call following function, the program crashes rapidly - very difficult to debug. Before I test it i first have to clean and rebuild my solution. Otherwise it does not crash - seems like it does not update the code if i don’t clean it before.

The code is written for host and device using (host device) for all functions and it runs very well on the cpu.

[codebox]float x = _viewPlane.inv_gamma; // the inv_gamma value is 1.0f

mapped_color = mapped_color.powc(1.0f); // ← RUNS FINE[/codebox]

[codebox]float x = _viewPlane.inv_gamma; // the inv_gamma value is 1.0f

mapped_color = mapped_color.powc(x); // ← PROGRAM CRASH[/codebox]

At the moment i really have completely NO idea, what it could be… if I replace the pow(float, float) function which is used in my powc function, the program still crashes.

Might it be, that it has to do with parallelism - because it works well on the cpu? I guess this, because i can get the value _viewPlane.inv_gamma and can call the mapped_color.powc(1.0f) independently, but obviously NOT together.

I hope you can help me.

Thanks

ONeill · August 6, 2010, 2:37pm

A host device function is compiled for host and device. If you call it from host code, it will be executed by host. If you call it from inside a kernel, it will be executed by device. So you said that all your functions are host device functions but you will need a kernel to call them by GPU too. I suppose you alrdy do that regarding that you say it runs fine on CPU but gives error in some unmentioned other case…
Can you show some more code plz? The kernel would be nice.

Beteigeuze · August 6, 2010, 3:00pm

Hi, thanks for the quick answer.
Indeed i have set up a running kernel which draws pixels on the screen: at the momennt just color values - but that works fine. it is the ray-trace algorithm which causes problems on the gpu.
Just for fun i tried to draw the viewplane.inv_gamma as pixel color - and it crashed again! So now i am sure, that is has to do with the reading of viewplanes inv_gamma.
And i think i know what the problem is - now. But have no idea yet how i shall solve it.

External Media

In my main class i initialise a world object. but this is cpu code - so the GPU- world object is never created, right?
What i want to avoid is, that the world has to be initialised every kernel call. if i call the init function in the kernel, it will be initialised every time.
Do you have an idea how i could initialise the world just once?

thanks a lot

ONeill · August 6, 2010, 3:21pm

I think I might not have enough insight into your problem cause the “solution” sounds quite obvious to me and I dont know anything about ray tracing. Why dont you write a kernel for initializing and keep everything you need in device memory?

Beteigeuze · August 6, 2010, 3:47pm

that’s what i try at the moment :)

i have now two worlds. one is initialised by the cpu and one on the gpu.

[codebox] World worldCPU; // instance of the ray tracer

		__device__ World worldGPU;[/codebox]

The main function (host) calls the init function directly for the worldCPU. The worldGPU is initialised by a kernel

[codebox] worldCPU.init();

	CUDARTInitWorld(&worldGPU);[/codebox]

[codebox]

// --------------------------------------------------------------------------

global void

CUDARTInit(device CUDARTracer::World* world)

{

world->init();

}

// --------------------------------------------------------------------------

void CUDARTInitWorld(device CUDARTracer::World* world)

{

CUDARTInit<<<dim3(1,1,1), dim3(1,1,1)>>>(world);

}[/codebox]

now, the program crashes at start up, when it tries to initialise the members of worldGPU.

Example:

[codebox]

    ...

class World

{	

private:

	ViewPlane		_viewPlane;

     ...

public:

	// --------------------------------------------------------------------------

	// initialisation

	__host__ __device__ void init()

	{

		

		// view plane  	  

		_viewPlane.set_hres(640); <-- PROGRAM CRASH!!!!

           ...[/codebox]

did i get the point what i have to do, or is this approach completely wrong? i have no idea why the class variables can’t be set.

thanks

Beteigeuze · August 6, 2010, 4:15pm

i modified my class now to “host device class …”.
but that does not help neither.

ONeill · August 6, 2010, 4:18pm

Ah ok. Your ViewPlane _viewPlane resides in host memory. So you cant access it without device code. I think this is also true for a member variables without a device in front of it that belong to a device class. But this is sth I have never tried before^^. I always have common classes and create instances like you would normally do. But these classes hold variables and arrays that I have declared in device memory. Im also still wriggling to declare functions with CUDA qualifiers like global in header files, where I declare my class (even with Fermi and CUDA 3.1). So I always have to use non-member device functions called by normal member-functions (which are called and executed by CPU). Otherwise I get “illegal combination of memory qualifiers”. My solution looks somewhat like this:

// my header:

class ClassToDoCudaStuff

{

	public:

		void doCudaStuff(float *doSthWithThis);

		private:

				float *doSthWithThisMallocedInDeviceMemory;

};

// my cu-file:

void ClassToDoCudaStuff::doCudaStuff(float *doSthWithThis)

{

		doCudaStuffKernel<<<1,42>>>(doSthWithThis);

}

__global__ void doCudaStuffKernel(float *doSthWithThis)

{

		// ...

}

Why dont you make the initKernel and the initDeviceFunction member functions of such a regular class? You look like you succeed in where I fail - making device functions member functions of a class :) Maybe you can help me with this.

Edit: Your last post makes me wondering a bit. Arent these function qualifiers like device only function qualifiers? Can you use them to map classes to device memory too? Well actually you can also copy structs to device memory so this might be possible…

Beteigeuze · August 6, 2010, 4:37pm

thanks a lot.

but still i am wondering how i can define a member variable in the device? just putting “device” or “host device” does not work.

__host__ __device__ ViewPlane		_viewPlane;

error: memory qualifier on data member is not allowed

I don’t want to declare all my variables before i initialise the world in my host and passing them, because my project is strongly class based - lots of classes with member variables.

Maybe someone else has an idea?

About your problem ONeill:

What I learned so far (but not sure if it is still true) - anyway, the method i use works ;):

NVCC is not able to link files, so the member functions must be in the same file (not this nice .h and .cpp structure anymore just by using .cuh and .cu)
the functions are automatically inlined
I never use cuh files - instead use cu-files!!
Use the CUDA Build Rule (Rightclick on the file → Properties → Genereal → Tool)
Do not exclude from project there

Here is an example - compile-able. I hope it helps.

#ifndef __CUDA_POINT_2D__

#define __CUDA_POINT_2D__

namespace CUDARTracer

{

	__host__ __device__ class Point2D

	{

	public:

		float	x, y;

		// --------------------------------------------------------------------------

		// default constructor

		Point2D (void)

			: x(0.0), y(0.0)

		{}

		// --------------------------------------------------------------------------

		// constructor

		Point2D (const float arg)

			: x(arg), y(arg)

		{}

		// --------------------------------------------------------------------------

		// constructor

		Point2D (const float x1, const float y1)

			: x(x1), y(y1)

		{}

		// --------------------------------------------------------------------------

		// copy constructor

		Point2D (const Point2D& p)

			: x(p.x), y(p.y)

		{}

		// --------------------------------------------------------------------------

		// destructor

		~Point2D (void) {}

		// --------------------------------------------------------------------------

 		// assignment operator

		__host__ __device__ Point2D& operator= (const Point2D& rhs)

		{

			if (this == &rhs)

				return (*this);

			x = rhs.x;

			y = rhs.y;

			return (*this);

		}

		// --------------------------------------------------------------------------

		// multiplication on right by scalar

		__host__ __device__ Point2D	operator* (const float a)

		{

			return (Point2D(a * x, a * y));

		}

	};

}

#endif

If you still have any solution, how i can make a member variable host device please let me know.

Beteigeuze · August 6, 2010, 4:40pm

if it does not work: in the properties of the file → custom build rule-> General-> Extra Options: -m32

ONeill · August 6, 2010, 4:48pm

This is something different. If you look in the programming guide you will find function qualifiers and variable type qualifiers. There is a host device function qualifier that compiles the function in host and in device code, but no host device var type qualifier. device makes a variable reside in global memory. Without its in host. So you need 2 separate vars and e.g. decide on runtime which one to use.

Edit:

From B.2.5 in the programming guide:

You could have a pointer to some float or whatever as a class member and allocate the memory it points to in device memory.

Have a nice weekend! :)

Beteigeuze · August 6, 2010, 5:01pm

yap, but device variable can just be used out of the kernel…so on host side. I want to use it inside of my class, which can be run on host AND device. It would not be a problem to have there a device variable and an additional host variable - but i am not able to do that.
I think i have a big structure problem here - as i said, i don’t want to define my device variables in the main code - i would like to have member variables of the class which can be run on host or device. mmhmh… i hope i will find a solution: if so, i will post it - hope you do the same ;) thanks a lot!

ONeill · August 6, 2010, 6:25pm

As stated in the programming guide there is no host device function type qualifier. A variable cant be compiled to reside in host and device memory. But there is surely a way to work with 2 vars. Can you explain why this wont work out for you? And I dont think you can use host + device for classes which are just similar to structs in host memory or in device memory.

Beteigeuze · August 6, 2010, 7:37pm

ok, let me try to describe my exact problem again and why i do not have a solution for that yet.

the user will be able to switch between cpu and gpu calculation at runtime. So i need one world-object which is in host-memory and one world object in device-memory.

World worldCPU;						// instance of the ray tracer stored in host memory

			__device__ World worldGPU;			// instance of the ray tracer stored in device memory

The world contains many different instances of classes and i have to safe them somewhere - That is why I want to use member variables. For example if i create a instance of a Material, this instance should have member variables which keep the information about specularity, color, and so on.

…

Right now i got a new cognition. i think this will fix my problem, but first i have to try it.

The program crashes if i create an object in my main class of the world and try to call the world-init-function via kernel (global function). It crashed at the position when it tries to access the viewplane member variables. BUT if i create a new object of the viewplane in the init-function - i am allowed to access the member variables.

Summary: It seems to me, that member variables work, but DON’T work if you try to access them from the global function. So the lowest level does not work.

pseydocode - In main class:

HERE: Trying to access worlds member variable Viewplane

__device__ world;

__global kernel

{

	 world.init();

}

__host__ __device__ world.init()

{

	_viewplane.read() <------ CRASH

}

same principle but on a deeper level - pseydocode: In world class:

HERE: Trying to access Viewplanes member variable xxx

__host__ __device__ world.init()

{

	 Viewplane viewplane;

	 viewplane.setXXX(100);  <--- NO CRASH

}

it is exactly the same principle, but the first one calls it from a global function and the last one calls it from a host device function.

Everything i have to do so is:

Extracting all member variables of the world to the main class and allocating memory in the device in addition.

Then I should be able to run it on the gpu as well. - I hope so ;) that means i will have two different init- and render-implementations in my world - one for the host using the usual memory and one for the gpu using the device memory.

I will try to figure out and will post the result ;)

ONeill · August 6, 2010, 7:49pm

Im not sure bout your first example but does it work when you make viewplane a device variable? It should then. But when u try to access host variables from inside a kernel it will crash of course. In the second example you declare the variable inside your kernel. So it will reside in register and is accessible by device. Your solution will work. The only problem in your example is when accessing host variables from device functions.

Beteigeuze · August 6, 2010, 8:02pm

if i try to make viewplane device

__device__ ViewPlane		_viewPlane;

error: memory qualifier on data member is not allowed

it is not allowed inside a kernel.

Yes, you are right: calling host memory from device crashes, but usually it should not be in the host memory.

If I create the viewplane in my the world class - the member variables of viewplane are accessible - so they are in the device memory!

i wonder why this does not work for the world object itself and the only answer for that could be - because it is declared on the host side. I thought if i declare it as “device World world”, it would be in the memory, but it seems to be not - very confusing. What do you think?

ONeill · August 6, 2010, 8:08pm

Well I guess defining a device class doesnt work as you would think. And as I said non device vars reside in host memory. That device as a type qualifier is only allowed at file scope is stated in the programming guide. And creating objects inside kernels works cause they are created in device memory (normally registers) then. I would just use normal classes with separate vars for host and device versions of them.

Beteigeuze · August 6, 2010, 8:12pm

yes, i think i will find a solution now. I will definetely post it, when i am done with it ;)
Now i spent the whole day, just to figure this out with you - time passes by so fast! :)

ONeill! Thanks a lot for today! - helper of the day!

ONeill · August 6, 2010, 8:19pm

Maybe someone else can give some better hints here… Sadly the solution we came up to is just the usual way to handle this so you will have to rewrite some bigger parts of your code adding this device vars if I got it right. Thx for poiting out that i have to drop my helpful headers and need to put everything inside a .cu file to have class members with CUDA qualifiers :)

Beteigeuze · August 6, 2010, 11:28pm

oh man - i spend so much time now in restructuring and the same problem still exists. I don’t use member variables in the world anymore. They come from the outer class (scope?) and are pointers to allocated memory in the device. I really wonder why i still have no permission to access them.

That’s how I allocated the space. The pointers are passed in a world object.

// allocate device memory

			cutilSafeCall( cudaMalloc( (void**) &_ambientLight, sizeof(Ambient)));

			cutilSafeCall( cudaMalloc( (void**) &_pinhole, sizeof(Pinhole)));

			cutilSafeCall( cudaMalloc( (void**) &_viewPlane, sizeof(ViewPlane)));

			cutilSafeCall( cudaMalloc( (void**) &_pointLightList, sizeof(PointLight) * POINTLIGHT_LIST_SIZE));

			cutilSafeCall( cudaMalloc( (void**) &_matteList, sizeof(Matte) * MATTE_LIST_SIZE));

			cutilSafeCall( cudaMalloc( (void**) &_sphereList, sizeof(Sphere) * SPHERE_LIST_SIZE));

			// copy host memory to device

			cutilSafeCall( cudaMemcpy( _ambientLight, world->_ambientLight,

				sizeof(Ambient), cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _pinhole, world->_pinhole,

				sizeof(Pinhole), cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _viewPlane, world->_viewPlane,

				sizeof(ViewPlane), cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _pointLightList, world->_pointLightList,

				sizeof(PointLight) * POINTLIGHT_LIST_SIZE, cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _matteList, world->_matteList,

				sizeof(Matte) * MATTE_LIST_SIZE, cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _sphereList, world->_sphereList,

				sizeof(Sphere) * SPHERE_LIST_SIZE, cudaMemcpyHostToDevice));

Very very strange problem - And no chance to debug - or is there a possibility (except nSight - it said i need two gpus to debug)

Beteigeuze · August 6, 2010, 11:28pm

oh man - i spend so much time now in restructuring and the same problem still exists. I don’t use member variables in the world anymore. They come from the outer class (scope?) and are pointers to allocated memory in the device. I really wonder why i still have no permission to access them.

That’s how I allocated the space. The pointers are passed in a world object.

// allocate device memory

			cutilSafeCall( cudaMalloc( (void**) &_ambientLight, sizeof(Ambient)));

			cutilSafeCall( cudaMalloc( (void**) &_pinhole, sizeof(Pinhole)));

			cutilSafeCall( cudaMalloc( (void**) &_viewPlane, sizeof(ViewPlane)));

			cutilSafeCall( cudaMalloc( (void**) &_pointLightList, sizeof(PointLight) * POINTLIGHT_LIST_SIZE));

			cutilSafeCall( cudaMalloc( (void**) &_matteList, sizeof(Matte) * MATTE_LIST_SIZE));

			cutilSafeCall( cudaMalloc( (void**) &_sphereList, sizeof(Sphere) * SPHERE_LIST_SIZE));

			// copy host memory to device

			cutilSafeCall( cudaMemcpy( _ambientLight, world->_ambientLight,

				sizeof(Ambient), cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _pinhole, world->_pinhole,

				sizeof(Pinhole), cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _viewPlane, world->_viewPlane,

				sizeof(ViewPlane), cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _pointLightList, world->_pointLightList,

				sizeof(PointLight) * POINTLIGHT_LIST_SIZE, cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _matteList, world->_matteList,

				sizeof(Matte) * MATTE_LIST_SIZE, cudaMemcpyHostToDevice));

			cutilSafeCall( cudaMemcpy( _sphereList, world->_sphereList,

				sizeof(Sphere) * SPHERE_LIST_SIZE, cudaMemcpyHostToDevice));

Very very strange problem - And no chance to debug - or is there a possibility (except nSight - it said i need two gpus to debug)

Topic		Replies	Views
CUDA header input file Declaring device variables in a separate .h or .cuh file CUDA Programming and Performance	7	8562	June 11, 2011
in emulation mode, the program works without emulation error why? ERROR: 0xC0000005: Access violatio CUDA Programming and Performance	6	8627	January 28, 2010
Device Memeroy allocation and data transfer Data transfer between host and device CUDA Programming and Performance	5	2595	June 16, 2011
CUDA function qualifiers in classes CUDA Programming and Performance	8	26988	April 12, 2012
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3264	August 27, 2010
"calling a __host__ function from a __host__ __device__ function is not allowed" CUDA Programming and Performance	3	3773	April 27, 2014
Declare global variable in function scope or suggest alternate solution CUDA Programming and Performance	2	929	March 21, 2011
cudaMalloc() return "cudaErrorLaunchFailure" CUDA Programming and Performance	19	14576	December 11, 2008
__device__ variariable doubt CUDA Programming and Performance	1	3632	June 6, 2008
pointer in global device memory CUDA Programming and Performance	9	11674	November 23, 2011

CUDA program crashes at function call

Related topics