questions about CUDA 3.1

laoen · July 13, 2010, 10:17pm

I heard that CUDA supports C++ and recursion now. I am wondering if I can use STL::list or STL::vector in the global and device functions.

Ken_Domino · July 14, 2010, 2:11am

Recursion works in 3.1 + Fermi card:

include <stdio.h>

include <stdlib.h>

include

include <cuda.h>

device int fact(int f)

{

if (f == 0)

	return 1;

else

	return f * fact(f - 1);

}

global void gfact(int * result, int f)

{

*result = fact(f);

}

int main()

{

dim3 Dg(1, 1);

dim3 Db(1);

size_t Ns = 0;

int * result;

int errMalloc = cudaHostAlloc((void**)&result, sizeof(int), cudaHostAllocMapped);

int * dresult;

int err = cudaHostGetDevicePointer((void**)&dresult, (void*)result, 0);

gfact<<<Dg, Db, Ns>>>(dresult, 5);

cudaThreadSynchronize();

cudaError_t ek = cudaGetLastError();

if (ek)

{

	std::cout << "Error in kernel call " << ek << std::endl;

	return 1;

}

std::cout << "Value = " << *result << std::endl;

cudaFreeHost(result);

return 0;

}

However, I have to compile with sm_20, and run it on a Fermi card. The kernel call fails with error code 8 on a GeForce 8900.

It won’t compile under sm_11:

“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.1\bin\nvcc.exe” …

produces: “Error: Recursive function call is not supported yet: fact(int)”

Ever since I’ve used Cuda (2.x), it’s supported C++, features like classes, templates, etc. You can use classes in your kernel code, but depending on what you want, it can be cumbersome. In classes, you have to have device (and sometimes global, too) qualifiers on your member functions, but you cannot on the constructors or destructors. You can pass a class object between host and device, either a copy of it (e.g., class foo {…}; global bar(foo v) {…} int main() { foo xxx; …; bar<<<…>>>(xxx); …}) or by native pointer (global bar(foo * v)). But reference parameters “& var” are not possible.

However, the bigger problem is representing complex data structures. For example, consider a graph with nodes and edges. You could represent a node as a class containing a list of native pointers to all those edge objects, and edges with two native pointers for the start and end of the edge, but you’d have to copy those objects to device space and the native host pointers to the device memory addresses. Yuk. Instead, people represent pointers usually as integer offsets into a big block, and allocate the objects out of that big block. You could hide a lot by overloading the ->, *, and operators, and you would have to write your own memory space malloc and new operator. I’ve been hacking at a dlmalloc for this purpose, so I can maybe allocate objects in device code, but mostly just so I can have common code between the host and device to access the complex data structure.

Lot’s of other issues of course. Just play around and you’ll see.

I can’t get std::vector declarations and calls to compile, but I didn’t think it would anyways. Functions called from the device have to have the device qualifier and you still cannot call a global function from a device function. Calls to printf are supposed to work in 3.1 + Fermi card, but I can’t seem to get it to work. Kind of disappointing after it was promised, and with Nsight still in beta, unless I’m doing something wrong.

YDD · July 14, 2010, 2:09pm

If you need STL::vector, you should look into the thrust library.

Topic		Replies	Views
C++ support for STL containers in device code and memory CUDA Programming and Performance	11	14219	December 11, 2010
Using Vector and Lists CUDA Programming and Performance	2	1159	January 13, 2010
CUDA, vectors and CUPP CUDA Programming and Performance	2	15184	December 15, 2010
CUDA function qualifiers in classes CUDA Programming and Performance	8	26956	April 12, 2012
Use of STL in device code? CUDA Programming and Performance	3	2177	September 25, 2008
Is CUDA C or C++ ? CUDA Programming and Performance	12	33656	January 30, 2009
cuda and C++ CUDA Programming and Performance	4	2136	September 5, 2010
Std::visit a std::variant CUDA Programming and Performance	3	1569	March 31, 2021
Using std::vector in cuda kernel? Its posible to use a std::vector inside cuda kernel? CUDA Programming and Performance	4	28411	May 1, 2010
Functions with deduced return type in device code, cuda 8.5 CUDA Programming and Performance	10	1374	June 26, 2017

questions about CUDA 3.1

Related topics