C++ Integration Limitations on using CUDA with C++

I am currently wanting to use CUDA to run a number of kernels on a number of different image buffers all maintained in a C++ library.

I have seen the cppIntegration example and see that it’s possible to push data down to CUDA but what I’m wondering is if there are any limitations on what data can be retrieved and then pushed down to CUDA somewhere else? Are all types available or just the base types? Is everything in /usr/local/cuda/include/ up for grabs from C++. The cudaArray and texture types are really want I want to handle.


You copy bytes when you do a memcopy from host to device (and vice versa). So, you can pass any type you want to CUDA, as long as your kernel takes the byte layout into consideration.


I understand the copying of data from C++ to CUDA and vice versa. What I would like to know is how easy it is to separate the code that does the allocation, from the code that implements the kernel. Basically what is the best way to store and manage a CUDA texture or any other CUDA object? I see that there is a textureReference structure in the texture_types.h file in /usr/local/cude/include. Is this what should be used when referencing a CUDA texture outside of CUDA?

I could figure this all out on my own probably quite easily, but it would be nice to have some documentation to let me know what I can and can’t do with that reference or any thing else that I may want to access from C++. :D

Hope that makes things clearer.

All you have to is pass pointers around. You can then copy the definitions from somewhere, or handle the encoding yourself with offsets and pointer type-casts. The only thing you can’t do is follow a pointer to video memory from the CPU, or vice versa (I’m sure you already know this).

So are you using void* in your C++ for maintaining the references to textures or cudaArrays? I would think that there must be a more type safe way to do this. Here is what I’m wanting to do:

#include <texture_types.h>

class Image





 const textureReference& getTexture() const;


  textureReference mCUDATexture;


Is textureReference the right thing to use here? Can I call a CUDA function and have it initialize the texture with a given size? Can I dynamically allocate it? Is there any things I need to be aware of to initialize it properly? Can I access all the members of the textureReference from C++ after the texture has been initialized?

The current problem with the examples is that everything is done using globals in the cu files which is fine if you are making an example, but I want to manage a large number of these things and so I need some way to manage the texture object. Later on I want to pass the textureReference down to a kernel class which will use the texture as an input.

So basically I’m looking for a more detailed cppIntegration example.

You don’t need to manage all data in c++ via void*. Just include cuda_runtime.h and your c++ code can work with float4’s and all of the other CUDA types. It can even call cudaMalloc, cudaFree, cudaMemcpy, and a host of other CUDA functions. (see page 77 of the 1.0 programming guide)

There are a couple things you can’t do from c++, though, as mentioned in the guide. Specifically, you can’t manage texture references (cudaBindTexture) or set constant memory (cudaMemcpyToSymbol) or call device functions, because these require special information generated by the nvcc compiler.

Note that texture references have to be global in the kernel’s .cu file anyways, so there is really no point in trying to manage them from within the c++ code. I have the C++ code handle all of the float4*'s and other data. When the c++ code wants a certain dataset to be active in a texture handle, it calls a C function that was compiled with nvcc that binds the data to that texture.

What you may find really annoying (as I do), is that since the texture references must be global in the kernel’s .cu file, you cannot have multiple .cu files use the same texture! The same goes for constant memory, because variables in constant memory have implied static storage according to the guide (this is not mentioned for textures that I noticed). Thus, the only reasonable way to handle dozens of kernels is to create a bunch of .cu files that are not compiled by nvcc. Then create a “big.cu” which includes all the other .cu files and have nvcc compile that. Hopefully this will change in the future, so that texture references and constant memory variables can be declared “extern” and a real multi-file environment can be done.

That is what I was wanting to know. But now that I know it I can’t say it’s what I wanted to hear.

So is there any way to hold a reference to a global array? What I really want to avoid is copying the data from the device after each kernel has run. The C++ code would load data into the array and then run a number of different kernels on the data in a chain. So kernel1 would create the input for kernel2 and so on. The data would then get pulled off the device only after the last kernel was called. But if I can’t extern the input arrays then I guess the only way would be to use the big.cu approach like you describe.

Is it possible to build a global include file that contains a stack of cudaArrays and then have CUDA code that can be called to grab the next available and allocate the memory on the device? It would be similar to how OpenGL handles texture objects. That way all .cu files would include that file and you would pass an index to the CUDA routines to let them know which array you are wanting to use. I would believe that would work, just hoped there was an easier way. Wouldn’t that also remove the need to create the big.cu file?

It looks like I have some reading and coding to do.

Despite the limitations I describe, you don’t need to copy data back from arrays in device memory every time. Just don’t use a globally declared global array. Just cudaMalloc() the memory you need and pass those pointers around in your c++ classes as needed. You can then pass these pointers into kernels as arguments without ever copying them back to the device.

It is only the use of texture references and constant memory across multiple files that prompted me to use the big.cu method. Texture references are a different beast compared to data in global memory. And as I said in the previous post, just have the C++ code call a function that wraps the cudaBindTexture call to change which memory location (or cudaArray) the texture is bound to.

O.K. that sounds more reasonable. Thank you for the clarification.