Over the last few weeks, I have written a wrapper API using the Cuda Driver API to simplify cuda programming. I have had great success. Currently I have hit a snag :-(
I have a cpp program: main.cpp
I have a cuda program: blocks.cu (in a different file)
I’d like to define a device array in main.cpp:
typedef struct __align__(16) {
int xstep, ystep;
int xindex, yindex;
float delx, dely;
int width, height;
} Mywrap;
Mywrap mywrap;
__device__ __align__(16) Mywrap wrap[20];
I’d like to access the contents of wrap[20] from blocks.cu . That is easy to do using the cuda
API, but there is not a single example explaining how to do this with the driver API?
When you use the driver API, all GPU code and data must be in the .cu file.
If you declare wrap in the .cu file, it will be allocated when the module is loaded and you can then use cuModuleGetGlobal to get a device pointer to the memory, something like this: