Cuda RuntimeAPI reference says that “There are two levels for the runtime API. The low-level API (cuda_runtime_api.h) is a C-style interface that does not require compiling with nvcc…”. So I should be able to compile the following code using a general C compiler like gcc or visual c++.
int main( int argc, char* argv )
[indent]cudaError_t err1 = cudaSetDevice(0);
void* p = NULL;
cudaError_t err2 = cudaMalloc( &p, 64 );
// call my_kernel_function_wrapper
However, I wonder how to do this in device EMULATION mode. nvcc has the -deviceemu option, but is it possible to do this for gcc or visual C, when the code does not use any language extensions provided by cuda?
Maybe a more basic question is, in emulation mode, how functions like cudaMalloc() is implemented?
shouldn’t the same cudaMalloc() function be called in both device mode and emulation mode, since there is only one copy of the runtime library?
If this is true, how does cudaMalloc know the program is in emulation mode and it should not allocate memory on a physical device?