OptiX Bug? crash with CUDA error: Kernel ret (700) when not rtPrinting anything (small demo code)

I have a crash with the following exception:

This is the sourcecode of the cuda file (edit: cpp file see below):

#include <optix_world.h>
struct BiDirSubPathVertex {bool existing;};
using namespace optix;

rtCallableProgram(void, sampleLightPath, ());
rtCallableProgram(void, sampleEye, ());
rtDeclareVariable(uint2,         launch_index, rtLaunchIndex, );
rtBuffer<float4, 2>              output_buffer;
RT_CALLABLE_PROGRAM void sampleLightPath_f() {}
RT_CALLABLE_PROGRAM void sampleEye_f() {}

RT_PROGRAM void pathtrace_camera() {
    BiDirSubPathVertex lightVertices[2];
    lightVertices[0].existing = false;
    lightVertices[1].existing = false;

    for(unsigned int i=0; i<2; i++) {
        if(!(lightVertices[i].existing)) break;
//    rtPrintf("cztery\n");

    output_buffer[launch_index] = make_float4(1.f, 1.f, 1.f, 1.f);

RT_PROGRAM void exception()
    output_buffer[launch_index] = make_float4(1.f, 1.f, 0.f, 0.0f);

You see the rtPrintf? if the comment signs are removed, the program doesn’t crash. So while the crash is simple to work around in this specific place, it would be hard if the print is not already there.

The original code file was about 700 lines long, the functions had parameters and traced rays. While removing more and more code, it always depended only on this one rtPrintf whether it crashed or not. In the original code I had exceptions and printing enabled, but it didn’t make any difference.

I verified the crash on Win7 64 (vs12 compiler, nvidia driver around 336 whql) and OpenSuse Linux 13.1 64 (gcc 4.8, nvidia driver 331.49), both systems had Cuda 5.5 and Optix 3.5 installed. the workstation has an AMD quadcore and a GeForce GTX 550Ti.


additionally verified on a Win7 64bit, vs12 compiler, nvidia driver 332.76, cuda 5.0 and Optix 3.0. the workstation has a intel xeon quad core and quadro 2000 graphics.<<

minimal example: http://xibo.at/meine/optixCrashBugPrintMinimalExample.zip new file with less code
It’s based on sutil, the same build steps as in the optix examples are necessary.

Is anybody able to reproduce?


Not sure if this will help in your particular case, but I have seen rtPrint cover up unrelated memory corruption problems.

memory corruption on host or device side?

yes, I was thinking of memory corruption all the time. That’s also why I shortened the program to these 31 lines, the host side, apart from sutil is also just 65 lines long.

If something, it could be memory corruption on the host side inside sutil. maybe I should stop using this behemoth.

I got rid of the SampleScene class. sutilSamplesPtxDir() is now the only sutil function I’m calling and still the same behaviour.
here is the updated full code: http://xibo.at/meine/optixCrashBugPrintMinimalExample2.zip

now the minimal project contains 2 source files (apart from sutil, cmake etc). the cuda file is posted already above, the other is here (I also updated the zip):

#include <optixu/optixpp_namespace.h>
#include <sutil.h>
#include <stdlib.h>
#include <string.h>

const char* const ptxpath( const std::string& target, const std::string& base ) {
  static std::string path;
  path = std::string(sutilSamplesPtxDir()) + "/" + target + "_generated_" + base + ".ptx";
  return path.c_str();

int main( int argc, char** argv ) {
  try {
    optix::Context context = optix::Context::create();
    context->setEntryPointCount( 1 );

    optix::Buffer buffer = context->createBuffer( RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT4, 512, 512);

    optix::Program exceptionProgram = context->createProgramFromPTXFile(ptxpath("helsinki", "BiDirCamera.cu"), "exception");
    optix::Program ray_gen_program = context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "pathtrace_camera");
    ray_gen_program["sampleLightPath"]          ->set(context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "sampleLightPath_f"));
    ray_gen_program["sampleEye"]                ->set(context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "sampleEye_f"));

    context->setRayGenerationProgram(0, ray_gen_program);
    context->setExceptionProgram(0, exceptionProgram);
    context->launch(0, 512, 512);
  } catch( optix::Exception& e ){
    sutilReportError( e.getErrorString().c_str() );

  return 0;

still the same behavior.

edit3: made the code even shorter for the post. tested but zip not updated.

I had the chance to test the program on one of my universities workstations. again, it’s the same behaviour.

these are the specs:
Win7 64bit, vs10 compiler, nvidia driver 332.76 whql, cuda 5.0 and Optix 3.0. the workstation has an intel xeon quad core and quadro 2000 graphics.

are there actually any OptiX developers on this forum? any chance of this being investigated?

or, anybody seeing a possibility for corrupting the stack?

Not that it’s of much help but I can confirm the issue. Specs: Win 8.1 x64, VS2012, driver 337.88, Cuda 5.5, Optix 3.5.1, Intel i7 4770K, GTX770.

I also have same these exceptions:

OptiX Error: 'Unknown error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)

OptiX Error: 'Unknown error (Details: Function “_rtContextLaunch2D” caught exception:
Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (716): Misaligned address)

rtContextLaunch2D does the kernel launch. but why then a writeback device to host (CuMemcpyDtoHAsync) occurs? Normally you run the kernel and results remain on the GPU. Only when desired a “download” from GPU to CPU can be requested. What is written to host there?

What is written to host is a status byte (or couple of bytes) indicating whether the launch succeeded. When the launch crashes you get the error you pasted above – it is very generic and does not necessarily have anything to do with rtPrintf.

Thank you very much for this clarification.

So actually “CuMemcpyDtoHAsync” itself did not crash with code 716; it only reports, that during kernel execution a mis-alignment occured.
Is there somewhere a documentation about all the cases, when mis-alignment can happen?
I found http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types
Are there more guidelines?

A misaligned address is another very generic error. It occurs when you’re reading from an unexpected memory location that doesn’t satisfy certain conditions for the read instruction, and it usually indicates an error in user code. It is very roughly analogous to a segfault in host code.

I would follow Detlef’s advice on the other thread to try and narrow it down, rather than continuing to cross-post here on this thread.