I am fairly new to cuda, and new to c++. I plan to write a program in c++ using the matrix template library (MTL). I want this program to call functions on the device. Specifically, I want to be able to pass a Matrix object created using MTL to the device. I understand that this is not possible now. Correct? Is there any work around? Anything at all? I understand that full C++ integration is planned for the future, but does anyone know if this will be 6 months or 6 years?
Has anyone used MTL and cuda together?
I am not familiar with MTL, but I assume it a pure template library, i.e. it exists as source files containing various template classes.
I think one of the difficulties is the lack of dynamic memory allocation on device. If you want to call a function in kernel code, you need to create some kind of object. Without dynamic memory allocation, it is usually impossible. However, you can always allocate memory in host code, and then construct that memory (by calling a dummy init function, which basically does what constructors do) in device code. It works, but programming in this way is quite frustrating.
Another problem is that most code written for CPU is not suitable for GPU. GPU has too many cores, and need even more to work efficiently. You may need to tune the code, or even rewrite it, to make it efficient on GPU.
For copying data from CPU memory to GPU memory, you should store the data in CPU memory in a contiguous address space. I guess the class desnse2D would also store the elements in M in a 1-Dimension array. So what you need to do is actually copy that array to GPU memory.
Assume that the implementation of dense2D is like this:
template <class T>
T * data;