That is exactly how I do it. Kernel calls, texture binding and constant memory copies are the only things in my code that are in .cu files. Everything else is in C++ files, including cudaMalloc and the such.
For error checking, I have my kernel wrappers perform a threadSynchronize, then check for errors and return the cudaError_t from cudaGetLastError (only in debug mode).
Right, I don’t use wrappers for memory allocations and copies. These are just plain jane C functions that can be called by any object file linked into the executable. Just include “cuda_runtime_api.h” (I think I got the file name correctly).
It’s only those calls that need nvcc special processing, like binding textures, copying to constant memory, and calling kernels that I use wrappers for.