Howdy, Stranger!
It looks like you're new here. If you want to get involved, click one of these buttons!
Categories
- All Discussions1,524
- General534
- Graphics109
- GPU Computing419
- Mobile141
- Pro Graphics163
- Tools158
In this Discussion
- davidhlav February 7
- franzdaubner February 8
- kalman February 8
- m227 February 7
- tayboonl February 1
cuda 4.1 slow compared with 4.0!?
-
Hi,
I have a test program for cuda created with v 4.0. When I updated to 4.1, the same code recompiled, got at least twice as slow for Cuda code and 3 times as slow for thrust. I did not changed anything, just installed the new release and recompiled. I am working on a Quadro 3000M (Dell precision laptop).
Any ideas of what is going on? Maybe I need to change some settings?
Thanks,
G. -
10 Comments sorted by
-
Hi G.
Kind of hard for anyone to understand the exact symptoms you're facing right now. Could you mind describing your test program, perhaps include some benchmarks just for everyone here to get a feel?
-
Hi,
Thanks for the answer. It is difficult for me to post the code. I will do this: I will uninstall 4.1 and cleanly re-install 4.0. Then I will compile my test program and post back the results.
Thanks,
G. -
Hi again,
I cannot find 4.0 anymore to download. Only 4.1. Can you please give me a link to download it?
thanks,
G. -
Ok, I found 4.0 version and installed it. Just by recompiling the code, I get back the old speed. The driver I use does not seem to make any difference. I tested with the driver coming with 4.0 and 4.1 and it is the same speed.
At this point since I do NOT change the code, I have to assume that somehow the compilation with 4.1 slows down the processing.
Please note that I use vs 2010.
Thanks,
G.
-
Hi G.
Glad it works out for you. I'm not sure what constitutes the slow down when CUDA 4.1 is suppose to give you slightly better boost in performance, at least that's what the brochure is saying. Oh well...
On that note, perhaps you like to use the visual profiler to profile your application in CUDA 4.1 and similarly for 4.0 and compare the difference. Try to take detailed snapshots of the resultant graphs for data load/stores. -
Ok will try.
Thanks,
G. -
Hi, I have the same problem on Linux.
After upgrading to CUDA Toolkit 4.1 my application got slower as well. I used visual profiler on both version.
The first difference was that 4.1 started to use twice as many registers as 4.0 did. Obviously that reduced the occupancy which unfortunately in my case did not result in application speed-up. This was easily fixed by setting minimum number of blocks per SM and it even gave me better memory throughput for 4.1.
However, the second one is kinda mystery to me. For some reason, active warps/active cycle is 1/3 of the value for version 4.0. Does anyone have a clue how could this happened?
Thanks
David
-
Hi,
I am glad to see that I am not the only one with such an issue.
Question: Has anyone tried to run both 4.0 & 4.1 side by side? When I installed 4.1, it did NOT gave me an option to uninstall 4.0 or, to override it. This may be an indication that somehow, it is possible to have both 4.0 & 4.1, and switch between them. Any thoughts?
Thanks,
G. -
I currently have the toolkits 3.2, 4.0 and 4.1 installed in parallel. They actually go to different installation directories by default and can coexist quite nicely :)
I configured all my projects to use the $(CUDA_PATH) environment variable, which is created automatically after you install any toolkit. The build rule file I use is NvCudaRuntimeApi.rules, which is configured by the installer to use the CUDA_PATH variable as well.
There are also specific environment variables for each toolkit, CUDA_PATH_V3_2, CUDA_PATH_V4_0 and so on.
All I have to do to switch between toolkits is to point the general CUDA_PATH variable to a specific variable, for example %CUDA_PATH_V3_2% to use the 3.2 toolkit.
Note that this only works for toolkits 3.x and newer, the 2.x toolkits had different installation routines. -