what is the difference between Runtime API Reference and Driver API Reference in the Reference Manual ?
for example, we have CudaMalloc (p 29) in one, cuMemAlloc (p 177) in the other, Both are : “Allocates count bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory” ?
Well, at at abstract level they essentially do the same thing, that is all I meant. Of course cudaMalloc has all sorts of extra stuff for the whole implicit context creation setup and whatnot.
Also, if you program with the CUDA runtime, I believe you also need to either distribute the runtime with your application, or install the CUDA toolkit/sdk on any machines that want to run the application (though I can’t vouch for this, as I only use my own dev box for CUDA stuff.)