Tesla K20c and "Error locating PTX Symbol: __cudart_i2opi_f, [1049181]", new OptiX 3.0

Hi all,

My code works well on Fermi and Kepler K104 archs. I tested it on 580 and 670 with CUDA 4.2 and 5.0 (cuda shader models ranging from 1.3 to 3.0) and OptiX 2.6 and 3.0. So I’m pretty confident that the code is stable.

However recently I’ve inserted the new Tesla K20c and started getting this error from OptiX:
OptiX Error: Invalid value (Details: Function “_rtContextCompile” caught excepti
on: Error locating PTX Symbol: __cudart_i2opi_f, [1049181])

I tried different Tesla drivers (307 and 310) and different CUDA versions (4.2 and 5.0). This problem appeared once I plugged Tesla GPU into the system with Geforce 670 and installed the corresponding drivers.

Also, OptiX 3.0 examples work well on Tesla (though I ran it the first time on Tesla). I suspect that there might be some sort of CUDA/OptiX cache for kernels, which were compiled before in the system.

Any suggestions?

No suggestions, other than a similar error on my end. I was curious of the performance of an OptiX code my co-worker has been working on with newer drivers 313.09 and OptiX 3.0 under Ubuntu 64-bit. Like you, my code has run fine with no issues on various other cards (GTX 285, GT 430, GT 440, GTX 680) and OptiX versions (2.1.1, 2.6)

Also, I have another NVIDIA GPU (GT 640) installed along with the K20c, which parallels your case.
With 313.09 and OptiX 3.0, running the OptiX 2.6 SDK, the code actually takes slightly longer with the GT 640/K20c combination, rather than just the 640 alone.

GT 640 ~ 1.34 sec
GT 640 / K20c ~ 1.6 sec

That alone was a bit odd.

When I try the same code with the OptiX 3.0 SDK, whether I have the GT 640 alone, or the GT 640 & K20c inserted, the error I get get is:

OptiX Error: Invalid value (Details: Function “RTresult _rtContextCompile(RTcontext_api*)” caught exception: Error locating PTX Symbol: __constant828, [1049181])

with the line that causes the error being:
RT_CHECK_ERROR( rtContextCompile( context ) );

I also tested OptiX 3.0 with the 304.64 drivers and the same issue persists.

I also tried the same OptiX 3.0 w/ 304.64 driver configuration with the GTX 285 and it works fine, actually comes out to a ~1.6x or ~3.7x speedup compared to the previous OptiX 2.6 version, depending on the version of the code that I’m running.

Strangely enough, the problem isn’t present on all Kepler GPUs:

I tried the OptiX 3.0 w/ 304.64 drivers along with a GTX 680 (GK104), and that configuration is able to execute my code just fine. With all that investigation, it seems that for me the culprit is that OptiX 3.0 does not like my GK107 GT 640.

I do not have a test system with an integrated video card, so I cannot isolate if this issue is also present when only the K20 is installed. If anyone over at NVIDIA can further debug this issue, it would be appreciated!

I think I figured it out… after a good amount of debugging. I had two versions of the code. One of them had rtPrintf statements in the intersect RT_PROGRAM. Once I commented out the rtPrintf statements, the program ran fine.

When I posted that I ran the GTX680 earlier, apparently I only ran the version of the code that did not have the rtPrintf statements. Regardless, I think this is still an OptiX bug, since older versions compiled fine with the print statements in place.

Check out if you have the same or a similar issue!

For your case:

I am not versed in PTX code, but this might give a hint:
http://gpuocelot.googlecode.com/svn-history/r715/trunk/branches/ocelot-ptx-2.1/ocelot/executive/test/kernels.ptx

Seems like __cudart_i2opi_f is some sort of array… perhaps see where you’re using a similar structure in your code and try to figure out what the specific issue is from there.

If you have a k20 paired with a 670, 640 other similar card only the k20 will be used. Typically OptiX tries to run on all the devices of a similar generation, but we observed issues with pairing devices based on the new Kepler 110 architecture (marked by SM version 35 found in the K20) and the original Kepler 100 architecture (marked with SM version 30 found in Geforce 670 and similar cards) we were unable to resolve for the OptiX 3.0 release. We chose to segregate these devices.

As far as the rtPrintf in an intersection program goes, we’ve reproduced and fixed this locally. This fix should make it out in the next version of OptiX.

Anton, I haven’t seen your particular error, but it looks to be the same issue (there was an issue with a special optimization we enabled for Kepler based cards and static variables in constant memory - __cudart_i2opi_f and string literals used for rtPrintf are such a case).