rtContextLaunch1D: unknown error

Hi I am getting this unknown error upon calling the rtContextLaunch1D() function. Any idea why this is happening? I am running Cuda6, Optix 3.6.2, Windows 8.1, 2xGeForce750M (SLI).

Unknown error (Details: Function “_rtContextLaunch1D” caught exception: En
countered a CUDA error: cudaFree( 0 ) returned (46): all CUDA-capable devices ar
e busy or unavailable, [3735717])

First, it’s recommended to disable SLI when using OptiX with multiple GPUs.
Please read the OptiX Release Notes.

That error is from the CUDA context initialization routine.
Do other OptiX programs work?

GeForce 750M, that’s inside a laptop? Optimus capable?
Might be that your program is not running on the discrete GPUs.

What device information do you get when running sample3?
Have you tried explicitly setting OptiX to use only one of the devices?

What’s your display driver version?
64-bit app?

SLI is disabled. What do you mean by other programs?

Yes it is inside a laptop.
Driver - 340.52
App - 32 bit

Output of sample3:
OptiX 3.6.2
Number of Devices = 2

Device 0: GeForce GT 750M
Compute Support: 3 0
Total Memory: 2147483648 bytes
Clock Rate: 967000 kilohertz
Max. Threads per Block: 1024
SM Count: 2
Execution Timeout Enabled: 1
Max. HW Texture Count: 128
TCC driver enabled: 0
CUDA Device Ordinal: 0

Device 1: GeForce GT 750M
Compute Support: 3 0
Total Memory: 2147483648 bytes
Clock Rate: 967000 kilohertz
Max. Threads per Block: 1024
SM Count: 2
Execution Timeout Enabled: 1
Max. HW Texture Count: 128
TCC driver enabled: 0
CUDA Device Ordinal: 1

Constructing a context…
Created with 2 device(s)
Supports 2147483647 simultaneous textures
Free memory:
Device 0: 2067517440 bytes
Device 1: 1527021568 bytes

What do you mean by other programs?<<

To determine if this is a general problem with your system or only with your application, do the pre-built OptiX SDK examples run?

Yes the other programs are running fine. Since the Optix is a C API, is it not recommended to use in it inside a C++ class? Meaning can I create the context in the constructor and then use this context in the member functions to launch the kernel?

That’s perfectly fine. Make sure the context is properly destroyed in your class destructor, though.

Detlef mentioned something else you should double check: Optimus. NVIDIA mobile cards, for lack of a better description, turn themselves off when not in use and switches to a different processor for display. It could be the case that it is not turning itself on when you run your program, thus failing to execute anything on the GPU. Although, if you’re able to run the other samples this probably isn’t the case. TBH I don’t know how optimus works in Windows, I use linux+bumblebee.

As a sanity check I would try and run your program on your second device. Sometimes the OS and other applications take over the primary GPU for their own tasks. If your kernel runs too long, the OS kills it. You can set it to run on the second device with something like:

int deviceId = 1;
context->setDevices( &deviceId, &deviceId+1 );

Even setting the device does not work. The pre-compiled libraries work but when I compile the source they do not work. How is that possible? The last line where I compile the context does not work and the error string is:
Unknown error (Details: Function “_rtContextLaunch1D” caught exception: En
countered a CUDA error: cudaFree( 0 ) returned (46): all CUDA-capable devices ar
e busy or unavailable, [3735717])

Also rtContextGetDevices() returns both the devices.

Context creation:

const char *ptx_filename = “Debug/SubdivisionStructure_3GP.cu.ptx”;
Optix_error(rtContextCreate(&context));
Optix_error(rtContextSetPrintEnabled(context, 1));
Optix_error(rtContextSetPrintBufferSize(context, 4096));
Optix_error(rtContextSetEntryPointCount(context, 2));
Optix_error(rtContextSetRayTypeCount(context, 2));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “CountHits”, &countHits));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “Visibility”, &visibility));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “ExceptionIntersectionCheck”, &exceptionIntersectionCheck));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “ExceptionInteriorCheck”, &exceptionInteriorCheck));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “bounding_box_program”, &bounding_box));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “closest_hit_visible”, &closestHit));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “any_hit_program”, &anyHit));
Optix_error(rtProgramCreateFromPTXFile(context, ptx_filename, “intersection_program”, &intersection));
Optix_error(rtBufferCreate(context, RT_BUFFER_INPUT, &optVertices));
Optix_error(rtBufferCreate(context, RT_BUFFER_INPUT, &optTriangles));
Optix_error(rtBufferSetFormat(optVertices, RT_FORMAT_FLOAT3));
Optix_error(rtBufferSetFormat(optTriangles, RT_FORMAT_UNSIGNED_INT3));
Optix_error(rtContextSetRayGenerationProgram(context, 0, countHits));
Optix_error(rtContextSetRayGenerationProgram(context, 1, visibility));
Optix_error(rtContextSetExceptionProgram(context, 0, exceptionInteriorCheck));
Optix_error(rtContextSetExceptionProgram(context, 1, exceptionIntersectionCheck));
int ndevs = {1};
Optix_error(rtContextSetDevices(context, 1, ndevs));
Optix_error(rtContextValidate(context));
Optix_error(rtContextCompile(context));

You’re running inside your debugger?
What about when starting the program standalone?
(If standalone works that could be exactly the Optimus case, e.g. possible that MSVS is not using the discrete GPUs.)

Also if this is a debug build, did you disable the debug command line parameters -g and -G for nvcc when building the device code. OptiX doesn’t handle PTX code with debug informaion.
Also make sure you have --use_fast_math enabled.

Converting to x64 seems to solve the problem? Do you know why this is?