PTX Validation

Does anyone know of a tool to validate PTX with? Not a parser/lexer, but something that can actually read in a PTX file and report any specific errors on a line-by-line basis (like a normal compiler would); I’ve got some PTX files that I’ve generated with a custom tool I’ve been building, but some of them make the driver return the ‘invalid image’ error code when I try to execute them.

ptxas can be coaxed into doing something like this. You have to manually turn on verbose error reporting though. See…IAGPUDevice.cpp around line 349 for an example of how to do it using the cuda driver API.

Alternatively ocelot will do some checking on PTX during parsing. You download ocelot and build the TestPTXParser program and point it at your PTX files, it won’t catch the same errors as ptxas, but it may be useful as a second point of reference.

Thanks Gregory! I’d completely forgotten about the error/info logging options for JIT compilation…I’ll add that into our code today and see if I can find our bug with it. If not, I’ll give ocelot a try – I’ve been meaning to do so for a while now anyway.

Update: I didn’t get much (read: any) useful information out of the driver using the error reporting (enabled via the JIT options), so I wrote a test harness around ptxas that runs our code through it and captures/parses the output. I was able to identify and fix the bugs with that; however…

Now, the PTX kernels are compiling just fine with ptxas (generating .cubin files), but for some reason, when I try to use them with the CUDA driver (via cuModuleLoadDataEx()), I’m still getting the ‘invalid image’ error code. Any reason why the driver’s JIT compiler wouldn’t allow something that works just fine with ptxas?

Have you tried loading a known good example that is produced by nvcc? For example, take the simplest CUDA kernel, run nvcc --ptx, and loading it as a module?

I’ve had some issues on some older driver versions where even simple things like this failed and disappeared after updating the driver/toolkit.

Hmm, I just compiled the BlackScholes example, and tried to load the PTX that nvcc generated, and I’m still getting the InvalidImage error. I’m running Win7 Ultimate x64 with the 257.21 driver and the 3.0 toolkit/SDK.

I would make sure that the context is being created correctly before loading a module. Here is some minimal code that should work:…/PTXChecker.cpp

So, I’ve been poring over my code to try to solve this problem. The code is so simple (my code is nearly identical to yours, Gregory, except that I’m using C# and calling the driver via P/Invoke).

After creating the context, but before loading the module, I’m making a few memory device memory allocations / H->D transfers…they are working just fine (because I can copy the data back without a hitch). If I place a breakpoint in my code just before the call to cuModuleLoadDataEx(), the context handle looks valid (it’s non-zero, anyway). I’m still not getting anything back in the JIT error/info logs though; even if I just use the simpler cuModuleLoadData(), I still get an error.

However, I did notice that cuModuleLoadDataEx() gives me an ‘invalid value’ error, whilst cuModuleLoadData() gives me the ‘invalid image’ error. Though, there’s nothing in the CUDA reference manual that would explain this difference (it’d be nice, at least for the driver methods, if the docs explained why/when a method would return each of it’s possible return values).

I had a problem like this a few months back where loading a module using the driver JIT failed while most other commands completed successfully. I pulled my hair out for a few days over the weekend and then the problem suddenly disappeared when I moved from my home office (9800GTX toolkit/driver ~2.3) to my lab workstation with the exact same GPU, OS, and driver (different CPU/MB though). I was not able to reproduce the problem on any lab machine (we tried 4 of them). Now when I go back and try on my office machine a few months later and a few driver revisions later I am again not able to reproduce the problem (although we also changed from linking using gcc to loading it manually using dlopen in Ocelot so it could be something related to that).

I would suggest trying another machine if you have one lying around. Failing that, I would try the Ocelot code for PTXChecker as I have verified that on several machines. If that fails, it may indicate a problem with the driver or how it is being loaded.

Gregory, thanks for all your help. I still haven’t quite fixed the error, but I did manage to figure out why I wasn’t getting any error messages back from cuModuleLoadDataEx() (it was such a convoluted bug that it’d take ages to explain…wasn’t even anything to do with CUDA).

So, now that I’ve got an error message to share:

error   : Binary format for key='1cf5f140', ident='cuModuleLoadDataEx_4' is not recognized

I tried specifying the PTX fallback strategy in my JIT options, but that didn’t help at all. Keep in mind, I’m just trying to load a PTX file (PTX version 1.4) generated from the “stock” Black-Scholes example in the SDK. I ran the file through ptxas just to be sure, and it generated a cubin without any problems.

EDIT: I just noticed that someone else got a similar error message when using the new drivers with OpenCL:…uleLoadDataEx_4

To add just a little more info…I tried downgrading to some previous beta and WHQL drivers (several of them, down to the 191 WHQL release), and I still get the same error. Once I went back one or two versions, the message changed to … ident=‘cuModuleLoadDataEx_3’ is not recognized. I don’t know if that helps find the problem at all, but hopefully…

Would you mind sending me the the PTX file that you are trying to load? I can try to get it to work on my local machine.

The invalid image messages appear to be caused by the first line of the PTX being blank with “\r\n” as the line ending (i.e. Windows line ending). Making the first line non-blank or preprocessing the PTX to convert line-endings (which is overkill since only the first line needs handling) should get you going.

string processedptx = ptx.Replace(Environment.NewLine, "\n");

Then just use processedptx instead of ptx in the cuModuleLoad().

Had the same issue, the thing that made me curious was that the samples (from the sample browser) were starting flawlessly, while the ones I build didn’t.
Figured out, building them for 64bit solved the issue, then everything seems to work as it should.

Maybe just the 32bit driver (wrapper?) that is broken.