Ocelot PTXParser Exception Register type conversion wreaks havoc with Ocelot's PTX parser

I’m hoping someone with more experience with PTX or Ocelot can help me out with this.

After compiling and linking with Ocelot, I try to run my CUDA program, but PTX parsing fails with:

Failed to parse file 'main.cu':

...

(lots of PTX code here)

...

3168  //<loop> Part of loop body line 447, head labeled $Lt_2_49154

3169	cvt.u8.u32	  %rh3, %r206;

3170	cvt.u64.u8	  %rd257, %rh3;

3171	cvt.u32.u64	 %r219, %rd257;

3172	cvt.u8.u32	  %r220, %r219;

3173	cvt.u64.u32	 %rd258, %r220;

3174	mul.lo.u64	  %rd259, %rd258, 4;

3175	add.u64		 %rd260, %rd1, %rd259;

3176	ld.global.u32   %r221, [%rd260+8];

3177	ld.global.u32   %r222, [%rd260+4];

3178	sub.u32		 %r223, %r221, %r222;

3179	bra.uni		 $Lt_2_9218;

...

(the rest of the PTX code here)

...

terminate called after throwing an instance of 'parser::PTXParser::Exception'

  what():  main.cu (3170, 5): Type of %rh3 u16 not convertable to type u8 .

To my eye this conversion seems legal. Moreover, the code that generates this PTX is repeated elsewhere in my CUDA code, but does not cause the same problem.

I’d really appreciate some ideas on how to go about debugging or fixing this parsing error.

Is %rh3 declared as a u16 register? Since the instruction on the previous line (3169) is converting the 32-bit value to an 8-bit value, if you’re trying to store the result in a register declared as u16, that would be the problem (and if that’s the case, perhaps the register is declared correctly in your other methods?)

Yes, all %rh registers are declared as u16, %r registers as u32, and %rd as u64.

So you can’t store a lower-bit value in a higher-bit register? I just assumed the extra bits would be ignored or zeroed out or something. In fact, similar instructions appear all over the place, with no problems. For instance:

5612  //<loop> Part of loop body line 447, head labeled $Lt_2_111106

5613	cvt.u8.u32	  %rh15, %r206;

5614	cvt.u64.u8	  %rd855, %rh15;

By the way, I’m not writing this PTX code, it was generated from my CUDA runtime code.

So this has come up a couple times and here’s the explanation. PTX is a typed language, but the type checking rules are not strict, nor were they clearly defined until PTX 2.x. For example, in a strict language, declaring a register as an 8 bit signed integer and using it in an instruction that expects a 16-bit signed integer would be an error. In PTX, this is an error if the instruction is anything except for cvt, ld, or st. Basically this is an ocelot bug that cannot be easily fixed without changing the way that type checking is done, and we have elected to keep it in the trunk until the next release where the type checking interface will be changed to fix it. See this thread for details: http://groups.google.com/group/gpuocelot/b…86dcb0bee10ed8b and a short-term solution.

Thanks very much Gregory, that workaround did the trick. If a lot of people have been having this problem, perhaps an item in the FAQ would be useful.

Now the code is parsed and runs! And the memory checker has detected the problem I was trying to find in the first place, so I’m happy. Thanks so much for your work on Ocelot. I’ve found it quite useful, and it is always nice to have open source tools.

FYI: this doesn’t really affect me at the moment, but when I try to enable memory traces I get this error:

terminate called after throwing an instance of 'hydrazine::Exception'

  what():  Failed to open MemoryTraceGenerator kernel trace file /<path>/<to>/<working>/<directory>/traces/_Z11exact_batchPKhii_2_0.trace

Aborted

If this is not a known issue or something wrong on my end, I can file a bug report.

I keep hoping to have a fix out for this that solves it unambiguously. It is currently in the ptx-2.1 branch, but that is still a few weeks away from being merged into the trunk. Maybe I should just write a faq entry for it in the meantime…

I’m glad that it was able to detect your problem. Feel free to file a bug report or ask on the mailing list if you run into another problem or think of a feature that would significantly help you track down bugs in your program.

Many of the trace generators are mainly used for research projects and as such aren’t really very robust or programmer friendly (actually it just went into r646). That particular trace generator expects there to be a ‘traces’ directory in the current working directory and isn’t smart enough to create one if it doesn’t exist.

We are trying to factor some parts of the code that are not robust and move them into a separate optional branch, so that users can try them out if they want, but there is an expectation that they may have to do some extra tweaking to get them to work for their individual problem or to measure a specific metric that they care about. We expect there to be a large number of trace generators added quickly in the fall as part of ongoing research projects, and having to go back and make sure they are easy to use and handle corner cases correctly limits the amount of time we can spend adding new features and analyzing the results.