I am developing software using Tesla C2075 in a single workstation for a embarrassingly parallel problem. Assuming that in a Ubuntu 12.04 workstation with 1 C2075 (CUDA 4.2), the program runs in 4n seconds then in a CentOS 5 workstation with 4 C2075 (CUDA 4.0), using
- 4 C2075 needs ~n seconds
- 1 C2075 needs ~5n seconds
It seems strange to me that
- we have 5 times speed up when we use 4 GPUs
- the required time for 1 C2075 in CentOS is not the same as in Ubuntu
Is there any problem with the CUDA driver or operating system here? Thanks