Nevermind, am slowly figuring it all out!
Here is a fully worked example showing how to invert a matrix using these functions:
The example was actually created to demonstrate an issue, but the code is functional and should produce correct results if you are using CUDA 6 (which has the identified bug fixed).
I think as described in the documentation:
"This function is intended to be used for matrices of small sizes where the launch overhead is a significant factor. "
there may be more efficient methods for inverting single large matrices, but I am not an expert.