You don’t need to manage all data in c++ via void*. Just include cuda_runtime.h and your c++ code can work with float4’s and all of the other CUDA types. It can even call cudaMalloc, cudaFree, cudaMemcpy, and a host of other CUDA functions. (see page 77 of the 1.0 programming guide)
There are a couple things you can’t do from c++, though, as mentioned in the guide. Specifically, you can’t manage texture references (cudaBindTexture) or set constant memory (cudaMemcpyToSymbol) or call device functions, because these require special information generated by the nvcc compiler.
Note that texture references have to be global in the kernel’s .cu file anyways, so there is really no point in trying to manage them from within the c++ code. I have the C++ code handle all of the float4*'s and other data. When the c++ code wants a certain dataset to be active in a texture handle, it calls a C function that was compiled with nvcc that binds the data to that texture.
What you may find really annoying (as I do), is that since the texture references must be global in the kernel’s .cu file, you cannot have multiple .cu files use the same texture! The same goes for constant memory, because variables in constant memory have implied static storage according to the guide (this is not mentioned for textures that I noticed). Thus, the only reasonable way to handle dozens of kernels is to create a bunch of .cu files that are not compiled by nvcc. Then create a “big.cu” which includes all the other .cu files and have nvcc compile that. Hopefully this will change in the future, so that texture references and constant memory variables can be declared “extern” and a real multi-file environment can be done.