Is ATI Stream better for encryption-type programming?

Im not really an expert on GPUs. I want to buy a GPU to do programming, not games. Would i be better off with an ATI card?

Im not really an expert on GPUs. I want to buy a GPU to do programming, not games. Would i be better off with an ATI card?

First off, given that this is an nVidia forum, you can probably expect some bias in the replies you get.

That said, if you just want to do general, all-around GPU programming, I think you’ll be better off getting an nVidia card for now. The main reason being that nVidia’s development tools are quite good and getting better by the day. Also, this board is a great source of knowledge if you ever need help, but most people here only have significant CUDA experience.

However, if you’re looking to do encryption or cryptography specifically, and you don’t really care about the tools, then I understand that the AMD/ATI cards have a somewhat higher integer performance level (though I don’t know if this is still the case since Fermi’s out). Also, if you get a Fermi card, you could probably overcome the (possibly) slower integer performance by using the much larger on-chip shared memory (which would be important for things like lookup tables.

First off, given that this is an nVidia forum, you can probably expect some bias in the replies you get.

That said, if you just want to do general, all-around GPU programming, I think you’ll be better off getting an nVidia card for now. The main reason being that nVidia’s development tools are quite good and getting better by the day. Also, this board is a great source of knowledge if you ever need help, but most people here only have significant CUDA experience.

However, if you’re looking to do encryption or cryptography specifically, and you don’t really care about the tools, then I understand that the AMD/ATI cards have a somewhat higher integer performance level (though I don’t know if this is still the case since Fermi’s out). Also, if you get a Fermi card, you could probably overcome the (possibly) slower integer performance by using the much larger on-chip shared memory (which would be important for things like lookup tables.

I can not answer your question but can point you at several relevant discussions.

see

http://forums.nvidia.com/index.php?showtopic=167237&st=0&p=1046719&#entry1046719

see comment 17 at http://code.google.com/p/pyrit/issues/detail?id=153

However…

http://www.golubev.com/blog/?cat=3

Scroll down and read the paragraph that starts with

"One more good news for Fermi owners is that it in fact supports “fused” SHL+ADD instruction. "

I can not answer your question but can point you at several relevant discussions.

see

http://forums.nvidia.com/index.php?showtopic=167237&st=0&p=1046719&#entry1046719

see comment 17 at http://code.google.com/p/pyrit/issues/detail?id=153

However…

http://www.golubev.com/blog/?cat=3

Scroll down and read the paragraph that starts with

"One more good news for Fermi owners is that it in fact supports “fused” SHL+ADD instruction. "

  1. ATI GPUs having higher peak performance with integers (and FP but it’s irrelevant here).

  2. ATI 5XXX GPUs can perform cyclic rotation with 1 instruction while Fermi based GPUs requires 2.

  3. ATI Stream SDK quality in no way can be compared with CUDA SDK quality. Even comparing with 2 years old CUDA SDK. It’s obvious which one is better, right?

  4. VLIW nature of ATI GPUs (and SDK “issues”) makes it (much) harder to program ATI GPUs to reach peak performance.

  5. “Encryption” is too common description. Heavy integer calculations (ALU bound) required for hashes like MD4/MD5/SHA1 while other algorithms presents too. Any encryption with s-boxes isn’t GPU friendly. For example, RC4 is unimplementable (with good performance) for ATI GPUs because of VLIW5 and local memory (LDS) addressing issues. Blowfish is unimplementable (again, performance-wise) for all GPUs (as it requires 4K RAM for internal state). Though it’s possible to preset Blowfish key with CPU and perform encryption itself with GPU keeping internal state at shared memory. However it won’t work if you’re need to change encryption key constantly.

  6. If you’re only starting with GPGPU – better stay away from ATI. There are enough problems with understanding GPGPU paradigm itself, so learning it from zero and fighting SDK bugs at the same time isn’t very pleasant experience. It always good to know that “this” is not working because you code it incorrectly and not because one more SDK bug pops up and screwed up your code.

  1. ATI GPUs having higher peak performance with integers (and FP but it’s irrelevant here).

  2. ATI 5XXX GPUs can perform cyclic rotation with 1 instruction while Fermi based GPUs requires 2.

  3. ATI Stream SDK quality in no way can be compared with CUDA SDK quality. Even comparing with 2 years old CUDA SDK. It’s obvious which one is better, right?

  4. VLIW nature of ATI GPUs (and SDK “issues”) makes it (much) harder to program ATI GPUs to reach peak performance.

  5. “Encryption” is too common description. Heavy integer calculations (ALU bound) required for hashes like MD4/MD5/SHA1 while other algorithms presents too. Any encryption with s-boxes isn’t GPU friendly. For example, RC4 is unimplementable (with good performance) for ATI GPUs because of VLIW5 and local memory (LDS) addressing issues. Blowfish is unimplementable (again, performance-wise) for all GPUs (as it requires 4K RAM for internal state). Though it’s possible to preset Blowfish key with CPU and perform encryption itself with GPU keeping internal state at shared memory. However it won’t work if you’re need to change encryption key constantly.

  6. If you’re only starting with GPGPU – better stay away from ATI. There are enough problems with understanding GPGPU paradigm itself, so learning it from zero and fighting SDK bugs at the same time isn’t very pleasant experience. It always good to know that “this” is not working because you code it incorrectly and not because one more SDK bug pops up and screwed up your code.

Thanks for all your answers guys. I have a few more questions.

  1. What are the phsyical attributes of the cards which determine their throughput? The article i read said the shader clock is more important than the core clock?

  2. Like above, how does the size of memory on the card affect performance for developing CUDA for hashing/encryption (sorry to be too generalistic) programming?

Thanks for your answers so far

Thanks for all your answers guys. I have a few more questions.

  1. What are the phsyical attributes of the cards which determine their throughput? The article i read said the shader clock is more important than the core clock?

  2. Like above, how does the size of memory on the card affect performance for developing CUDA for hashing/encryption (sorry to be too generalistic) programming?

Thanks for your answers so far

  1. Yes, shader clock and SP count defines the peak performance of NVIDIA GPUs. However, SP count for GF104/6/8 cannot be directly compared with GF100 SPs because of superscalar architecture.

  2. If you mean on-board RAM amount – most times it’s irrelevant for hashing/encryption as we don’t need much RAM (and even high memory bandwidth). Usually these kernels are totally ALU bound.

  1. Yes, shader clock and SP count defines the peak performance of NVIDIA GPUs. However, SP count for GF104/6/8 cannot be directly compared with GF100 SPs because of superscalar architecture.

  2. If you mean on-board RAM amount – most times it’s irrelevant for hashing/encryption as we don’t need much RAM (and even high memory bandwidth). Usually these kernels are totally ALU bound.

I was trying to improve speed of MD-like hashes on Fermi using 2 instructions for rotate:

  1. shr + mad

  2. bfi + bfe

All these combinations have absolutely the same speed as classic

“shl + shr + add” on GTX 460. Do you have another results?

I was trying to improve speed of MD-like hashes on Fermi using 2 instructions for rotate:

  1. shr + mad

  2. bfi + bfe

All these combinations have absolutely the same speed as classic

“shl + shr + add” on GTX 460. Do you have another results?