NVIDIA has hade a huge mistake with HW debugger Single-GPU debugging not supported and no emulation&

Makes sense to me…force the professional developers to buy two Fermi cards.

/sarcasm

I found out that cudafe on Windows assumes the source is MSVC when parsing, and there is no option that we can set to change this (“–a”, “–b”, “–c” … gets a list of specific options for the ambiguous cmd line args), even though it seems the Edison Design Group parses anything. Cudafe doesn’t parse g++ preprocessor output.

So I talked with vinod grover from nvidia over the weekend, and came to the conclusion that building ocelot on windows would probably be a good idea, if for no other reason than to improve the stability of the code and test it on multiple platforms. I started working on it on sunday night, and finished tonight after work. Here is a link to the first static library that I was able to build, (the most annoying part was a lack of rvalue support in visual studio). Right now only the emulator sources are included as they were easier than building llvm and linking against the CUDA/CAL drivers. I’ll merge the modified sources into the ocelot trunk and try testing it a bit tomorrow.

If anyone wants to try it out, I am expecting a ton of bugs, but it is closer to being functional than it was last week.

[url=“http://www.gdiamos.net/files/gpuocelot.lib”]http://www.gdiamos.net/files/gpuocelot.lib[/url]

Amazing :)

Can you please elaborate a bit how to use it? Just link it to the application and ???

thanks

eyal

Ideally you should just be able to link you application against it rather than against cudart.dll. Again, I’m expecting some problems, and will try running some unit tests after work today…

Doesn’t link because of naming problems. E.g., cudaMalloc@8 (expecting) vs. cudaMalloc (defined in your static library).

Are you using the right linkage?

ocelot/cuda/interface/cuda_rutime.h defines cudaMalloc as:

extern cudaError_t cudaMalloc(void **devPtr, size_t size);

But, in the Nvidia’s CUDA library, the function is defined (in cuda_runtime_api.h) as:

extern host cudaError_t CUDARTAPI cudaMalloc(void **devPtr, size_t size);

where CUDARTAPI is defined with “define CUDARTAPI __stdcall”.

Thanks for the update. This evidently didn’t matter on linux. I’ll go back and make this change tonight.

Not ignoring it, but remember that cuprintf is not included in 3.1, you have to sign up as a developer and very seriously look for it, knowing what your looking for.

It isn’t that I can’t debug at all, it’s just that it is now painful and not even part of their cuda sdk unless you have fermi, and 2 fermis to do any kind of professional debugging like… gasp… breakpoints… when the last rev had a decent solution for most bugs.

My biggest problem is to see such amateurish execution. It’s very difficult to see such steps backward when nvidia is trying to push cuda development forward.

Finally, there are many cases where printf’s crash my system in complex kernels, even simple ones just spitting out a few ints.

I think it’s important for nvidia to hear the passion. They need to hear the real frustration, time and money they are costing serious developers, at a crucial time when they truly need developers, so I’m being very vocal on these boards hoping they are reading. I work on commercial, production and consumer systems and the target machine is never fermi (hpc product) but either gtx480 or lesser (no fast doubles). I get the feeling nvidia thinks all development is in an academic lab where phd candidates are willing to go to any platform and toolchain to make things work. This simply ignores the realities of production work and limited schedules.

I’m their biggest fan and have pushed cuda into many real world solutions for some big companies but if they don’t start doing a better job, I won’t be able to advocate for them with a straight face. If they can’t keep their biggest fan and cheerleader, they’re in a world of trouble.

So I was able to get a few examples linked against ocelot and pass the built-in regression tests. I also tried a few examples with memory errors to see if Ocelot could detect them correctly. They are being detected, but the mechanism used to report errors, exceptions, is handled differently on windows (the default exception handler never calls what()) on std::exceptions so you don’t get any intelligent error messages. I’m going to leave the behavior as is on ocelot in order to allow people to catch and handle errors or get a useful error message. Someone at MS should fix that btw, the default behavior in GCC is far more useful.

The sources will be merged with the google code trunk over the weekend. In the meantime, you can download my debug build from http://www.gdiamos.net/files/gpuocelot.lib

One last update before I move on to other things:

One last update before I move on to other things:

If you really wanted to, you could link the application against cudart.dll and still use Ocelot (on demand). I don’t know how to do it on Linux, but I could pretty easily write the Windows code if you wanted it (in fact, I was actually planning to write something similar, and a majority of the code will be reusable).

If you really wanted to, you could link the application against cudart.dll and still use Ocelot (on demand). I don’t know how to do it on Linux, but I could pretty easily write the Windows code if you wanted it (in fact, I was actually planning to write something similar, and a majority of the code will be reusable).