hi, fairly new to cuda programming. i have a linux box with 2 cards, a GeForce 8800 and a new Tesla 2050.
i ran the exact same code (vectorAdd example), with the linux time command, but the Tesla ran slower than the GeForce??? anyone have any idea why? anybody have the same experience? One thing i notice while grabbing the properties of the devices is that the CUDA ver for the Tesla card is 2.0 ??? i’m running Cuda 3.0 SDK… wonder if this has anything to do with it?
tia
Device Name - Tesla C2050 Vector addition
PASSED
Done
0.018u 1.365s 0:01.61 85.0% 0+0k 0+0io 0pf+0w
Device Name - GeForce 8800 Ultra Vector addition
PASSED
Done
0.012u 1.228s 0:01.46 84.2% 0+0k 0+0io 0pf+0w
Here’s the device properties:
Device Name - Tesla C2050
Total Global Memory -2751936 KB
Shared memory available per block - 48 KB
Number of registers per thread block - 32768
Warp size in threads - 32
Memory Pitch - 2147483647 bytes
Maximum threads per block - 1024
Maximum Thread Dimension (block) - 1024 1024 64
Maximum Thread Dimension (grid) - 65535 65535 1
Total constant memory - 65536 bytes
CUDA ver - 2.0
Clock rate - 1147000 KHz
Texture Alignment - 512 bytes
Device Overlap - Allowed
Number of Multi processors - 14
Device Name - GeForce 8800 Ultra
Total Global Memory -785728 KB
Shared memory available per block - 16 KB
Number of registers per thread block - 8192
Warp size in threads - 32
Memory Pitch - 2147483647 bytes
Maximum threads per block - 512
Maximum Thread Dimension (block) - 512 512 64
Maximum Thread Dimension (grid) - 65535 65535 1
Total constant memory - 65536 bytes
CUDA ver - 1.0
Clock rate - 1512000 KHz
Texture Alignment - 256 bytes
Device Overlap - Not Allowed