We had implemented an Euler solver to simulate shock wave on CUDA with double precision, we fodun that the sample program running on CUDA 2.3 is worse than CUDA 2.0 as the grid size increased. We are running on GTX280 with 1G on board memory, and the host CPU is Core 2 Duo E8500 with 4GB memory with OS Windows XP 64.
Attached is the single step scaling chart, the grid size is 128x512, 256x1024, 512x2048 and 1024x4096. We could find out that the performance of CUDA 2.0 is better than CUDA 2.3. Does any body encounter the same problem?