Creating Cuda subroutines Adding to Cublas

Hello,

I am very new to Cuda programming. My main experience has been using the Cublas library. I want to write simple functions/subroutines in Cuda that can be linked to C++ programs created with VS2005, such as subtracting one matrix from another, which are not in the Cublas library. Do I write a cu program, and compile it with nvcc as an obj file, and then link it with the main C++ program? In this case, the memory for the matrices have already been allocated in the main C++ program using the Cublas helper functions. A simple example with step by step instructions would be appreciated.