Extremly long compilation on CUDA 1.1

Yesterday I updated CUDA from version 1.0 to 1.1 and noticed that now simple code compiles more than 1 hour now.
If one changes line “for(i = 0; i < 2; i ++){” to " for(i = 0; i < 1; i ++){" everything compiling ok, but if there’s anything greater than 1, compilation hangs. (be.exe)
md5_kernel.cu.txt (11.1 KB)

I guess it is code optimizer which slows down compilation. I’ve seen similar problem. Re-arranging code a little bit solves the problem.

Few comments on your code.

  1. You can replace all your rotation functions with simple macro. Might be easier for compiler to handle than inlined functions.
  2. Not sure about __int64 in kernel code. Anyway, for this type of kernels it is trivial to avoid using it.
  3. You should learn a bit about GPU memory types, how it’s organized, etc. Keywords: shared memory, coalescing, constant memory, textures :-).
  4. And the last but not least: what’s the point of inventing one more MD5 implementation? There’s a lot of good open-source libraries containing it, OpenSSL being a good example. MD5 from it compiles for GPU with only minor modifications.

17 hours’re not enough for compilation. :blink:

Seems that it doesn’t have an optimizer or it’s buggy. Inlined version of code used more than 120 redundant local variables when only one or two were nessesary. :blink:

A lot of thanks, now it works.

It’s only declared but not used anywhere in kernel.

I’ve used rfc1321 and varied code a bit. Secondly, MD5 is much more simplier than Navier-Stokes, isn’t it?

You can’t compare them directly. MD5 is algorithm while Navier-Stokes are equations you need to solve.

If you think MD5 is simple try to find good differential path (better than published ones) for finding (near)collisions :-)

Anyway, that was just a remark about your code. If you’re satisfied with what you have you may ignore it. ;-)

Must I be registered developer to submit bugreport?

If no, how can I do that?

Ok, wait for new breakthroughs, i’ve started :)

My scientific instructor (and seemingly I) has plan to implement his calculation tecnique in parallel machine, so for me NSE is another algorithm.

Yes. Or I may file it for you if you don’t mind ;-)

I will be happy :)

Bug was found on WinXPProf, CUDA SDK 1.1, CUDA Toolkit 1.1, driver 169.21, GeForce 8600GTS 256Mb, Athlon X2 6000+

Okay, I think I’ve submitted it.