Using cudamalloc in device function

Hi,

I am writing an algorithm to run in GPU using cuda and my algorithm depend heavily in pointers.
I already test it in cpu and it runs perfectly. Whoever, when I started to convert it to cuda there was an error that I can’t run cudamalloc inside device function.

I am really disappoint because I must use cudamalloc in my algorithm.

1- Is there any idea how to solve this problem ?

2- Is there any way to use LIST in cuda because I had to write it in C by my self so I can replace cudamalloc with default LIST?

I also tried new and delete but still I need to call it from global function rather than device function.

Any idea ?

I really need your help guys.

I just read that I can use C++ class in Cuda.

I made this class :-

__device__ class ListClass
{
public:
	__device__ ListClass()
	{
		list = new List;
	}

	__device__ ~ListClass()
	{
	}

private:
   List* list;
};

I want to know List is now loaded into the global device memory or not ?
Can I call this class from any device function ?
Is any object of this class will not be destroyed until the end of the application ?

I got an error :-

error : calling a host function("operator new ") from a device function(“ListClass::ListClass”) is not allowed

Whoever, It was acceptable in the CUDA samples :-

template<class T>
class Vector : public Container<T> {

public:
	// Constructor, data is allocated on the heap
    // NOTE: This must be called from only one thread
	__device__
	Vector(int max_size) :  m_top(-1) {
		m_data = new T[max_size];
	}

	// Constructor, data uses preallocated buffer via placement new
	__device__
	Vector(int max_size, T* preallocated_buffer) :  m_top(-1) {
		m_data = new (preallocated_buffer) T[max_size];
	}

    // Destructor, data is freed 
    // NOTE: This must be called from only one thread
	__device__
	~Vector() {
		if( m_data ) delete [] m_data;
	}

	__device__
	virtual
	void push(T e) {
        if( m_data ) {
		    // Atomically increment the top idx
		    int idx = atomicAdd(&(this->m_top), 1);
		    m_data[idx+1] = e;
        }
	}

	__device__
	virtual
	bool pop(T &e) {
		if( m_data && m_top >= 0 ) {
			// Atomically decrement the top idx
			int idx = atomicAdd( &(this->m_top), -1 );
			if( idx >= 0 ) {
				e = m_data[idx];
				return true;
			}
		}
		return false;
		
	}


private:
	int m_size;
	T* m_data;

	int m_top;
};

Any idea ???

Ok, I think I found the problem.

If anyone have the same problem. Here is the solution :-

http://stackoverflow.com/questions/6937693/dynamically-allocating-memory-inside-device-global-cuda-kernel