Yesterday I updated CUDA from version 1.0 to 1.1 and noticed that now simple code compiles more than 1 hour now.
If one changes line “for(i = 0; i < 2; i ++){” to " for(i = 0; i < 1; i ++){" everything compiling ok, but if there’s anything greater than 1, compilation hangs. (be.exe)
md5_kernel.cu.txt (11.1 KB)
I guess it is code optimizer which slows down compilation. I’ve seen similar problem. Re-arranging code a little bit solves the problem.
Few comments on your code.
- You can replace all your rotation functions with simple macro. Might be easier for compiler to handle than inlined functions.
- Not sure about __int64 in kernel code. Anyway, for this type of kernels it is trivial to avoid using it.
- You should learn a bit about GPU memory types, how it’s organized, etc. Keywords: shared memory, coalescing, constant memory, textures :-).
- And the last but not least: what’s the point of inventing one more MD5 implementation? There’s a lot of good open-source libraries containing it, OpenSSL being a good example. MD5 from it compiles for GPU with only minor modifications.
17 hours’re not enough for compilation. :blink:
Seems that it doesn’t have an optimizer or it’s buggy. Inlined version of code used more than 120 redundant local variables when only one or two were nessesary. :blink:
A lot of thanks, now it works.
It’s only declared but not used anywhere in kernel.
I’ve used rfc1321 and varied code a bit. Secondly, MD5 is much more simplier than Navier-Stokes, isn’t it?
You can’t compare them directly. MD5 is algorithm while Navier-Stokes are equations you need to solve.
If you think MD5 is simple try to find good differential path (better than published ones) for finding (near)collisions :-)
Anyway, that was just a remark about your code. If you’re satisfied with what you have you may ignore it. ;-)
Must I be registered developer to submit bugreport?
If no, how can I do that?
Ok, wait for new breakthroughs, i’ve started :)
My scientific instructor (and seemingly I) has plan to implement his calculation tecnique in parallel machine, so for me NSE is another algorithm.
Yes. Or I may file it for you if you don’t mind ;-)
I will be happy :)
Bug was found on WinXPProf, CUDA SDK 1.1, CUDA Toolkit 1.1, driver 169.21, GeForce 8600GTS 256Mb, Athlon X2 6000+
Okay, I think I’ve submitted it.