OpenGL interop + Zero copy Usage Cannot run both simultaneously

Have an application where a command line parameter decided whether or not the computed image gets put up to an OpenGL window. For OpenGL, ‘cudaGLSetGLDevice must be specified before any other runtime calls’ and ‘is mutually exclusive to cudaSetDevice’ (per the documentation). So I have placed the cudaSetDevice and cudaGLSetGLDevice in alternate If/then/else paths based on the command line option.

Decided to add Zero-copy functionality to existing code since this application will ran on laptops with integrated graphics chips quite often. After some futzing around, got it to work, but only when the display option isn’t enabled. If I turn ‘off’ the zero copy feature such that a ‘normal’ cudaMemcpy2D is done instead, the display works fine.

I see in the documentation that the cudaSetDeviceFlags(cudaDeviceMapHost) must be run before the device is set, so I execute that before the cudaGLSetGLDevice command…

…alright, just answered my own question here. I’ve got a chicken/egg problem as follows:
[indent]cudaSetDeviceFlags needs to be set before cudaGLSetGLDevice is called… This is OK
Can’t do any runtime calls before cudaGLSetGLDevice…, but
need to determine image size before Initializing GL…, which
requires me to cudaHostAlloc in order to map this memory, which is a runtime call that can’t be made before cudaGLSetGLDevice
[/indent]Looks like I’ll somehow have to read that data file and get the image size using ‘normal’ (non-cuda) malloc calls, determine image stream size, reset file pointer, InitGL, allocate host mapped memory, copy from file to mapped memory, then start GL rendering stream.

Wow, what a pain in the a$$.

why not just use cudaHostRegister?

Apparently because it’s not in the ver. 3.2 programming guide, which is currently what I’m referencing. So this is great, I can ‘malloc’ like I usually do, load the video sequence into this host space, see the correct image sizes, initialize GL, and then page lock/map the pointer with cudaHostRegister as long as I’ve done the cudaSetDeviceFlags(cudaDeviceMapHost) before initialization, right?